Postnatal environmental exposures, particularly those found in household products and dietary intake, along with specific serum metabolomics profiles, are significantly associated with the BMI Z-score of children aged 6-11 years. Higher concentrations of certain metabolites in serum, reflecting exposure to chemical classes or metals, will correlate with variations in BMI Z-score, controlling for age and other relevant covariates. Some metabolites associated with chemical exposures and dietary patterns can serve as biomarkers for the risk of developing obesity.
Research indicates that postnatal exposure to endocrine-disrupting chemicals (EDCs) such as phthalates, bisphenol A (BPA), and polychlorinated biphenyls (PCBs) can significantly influence body weight and metabolic health (Junge et al., 2018). These chemicals, commonly found in household products and absorbed through dietary intake, are linked to detrimental effects on body weight and metabolic health in children. This hormonal interference can lead to an increased body mass index (BMI) in children, suggesting a potential pathway through which exposure to these chemicals contributes to the development of obesity.
A longitudinal study on Japanese children examined the impact of postnatal exposure (first two years of life) to p,p’-dichlorodiphenyltrichloroethane (p,p’-DDT) and p,p’-dichlorodiphenyldichloroethylene (p,p’-DDE) through breastfeeding (Plouffe et al., 2020). The findings revealed that higher levels of these chemicals in breast milk were associated with increased BMI at 42 months of age. DDT and DDE may interfere with hormonal pathways related to growth and development. These chemicals can mimic or disrupt hormones that regulate metabolism and fat accumulation. This study highlights the importance of understanding how persistent organic pollutants can affect early childhood growth and development.
The study by Harley et al. (2013) investigates the association between prenatal and postnatal Bisphenol A (BPA) exposure and various body composition metrics in children aged 9 years from the CHAMACOS cohort. The study found that higher prenatal BPA exposure was linked to a decrease in BMI and body fat percentages in girls but not boys, suggesting sex-specific effects. Conversely, BPA levels measured at age 9 were positively associated with increased adiposity in both genders, highlighting the different impacts of exposure timing on childhood development.
The 2022 study 2022 study by Uldbjerg et al. explored the effects of combined exposures to multiple EDCs, suggesting that mixtures of these chemicals can have additive or synergistic effects on BMI and obesity risk. Humans are typically exposed to a mixture of chemicals rather than individual EDCs, making it crucial to understand how these mixtures might interact. The research highlighted that the interaction between different EDCs can lead to additive (where the effects simply add up) or even synergistic (where the combined effect is greater than the sum of their separate effects) outcomes. These interactions can significantly amplify the risk factors associated with obesity and metabolic disorders in children. The dose-response relationship found that even low-level exposure to multiple EDCs could result in significant health impacts due to their combined effects.
These studies collectively illustrate the critical role of environmental EDCs in shaping metabolic health outcomes in children, highlighting the necessity for ongoing research and policy intervention to mitigate these risks.
This study will utilize data from the subcohort of 1301 mother-child pairs in the HELIX study, who are which aged 6-11 years for whom complete exposure and outcome data were available. Exposure data included detailed dietary records after pregnancy and concentrations of various chemicals like BPA and PCBs in child blood samples. There are categorical and numerical variables, which will include both demographic details and biochemical measurements. This dataset allows for robust statistical analysis to identify potential associations between EDC exposure and changes in BMI Z-scores, considering confounding factors such as age, gender, and socioeconomic status. There are no missing data so there is not need to impute the information. Child BMI Z-scores were calculated based on WHO growth standards.
load("/Users/allison/Library/CloudStorage/GoogleDrive-aflouie@usc.edu/My Drive/HELIX_data/HELIX.RData")
filtered_chem_diet <- codebook %>%
filter(domain %in% c("Chemicals", "Lifestyles") & period == "Postnatal" & subfamily != "Allergens")
# specific covariates
filtered_covariates <- codebook %>%
filter(domain == "Covariates" &
variable_name %in% c("ID", "e3_sex_None", "e3_yearbir_None", "h_edumc_None", "h_cohort", "hs_child_age_None"))
#specific phenotype variables
filtered_phenotype <- codebook %>%
filter(domain == "Phenotype" &
variable_name %in% c("hs_zbmi_who"))
# combining all necessary variables together
combined_codebook <- bind_rows(filtered_chem_diet, filtered_covariates, filtered_phenotype)
kable(combined_codebook, align = "c", format = "html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
| variable_name | domain | family | subfamily | period | location | period_postnatal | description | var_type | transformation | labels | labelsshort | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| h_bfdur_Ter | h_bfdur_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Breastfeeding duration (weeks) | factor | Tertiles | Breastfeeding | Breastfeeding |
| hs_bakery_prod_Ter | hs_bakery_prod_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: bakery products (hs_cookies + hs_pastries) | factor | Tertiles | Bakery prod | BakeProd |
| hs_beverages_Ter | hs_beverages_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: beverages (hs_dietsoda+hs_soda) | factor | Tertiles | Soda | Soda |
| hs_break_cer_Ter | hs_break_cer_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: breakfast cereal (hs_sugarcer+hs_othcer) | factor | Tertiles | BF cereals | BFcereals |
| hs_caff_drink_Ter | hs_caff_drink_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Drinks a caffeinated or æenergy drink (eg coca-cola, diet-coke, redbull) | factor | Tertiles | Caffeine | Caffeine |
| hs_dairy_Ter | hs_dairy_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: dairy (hs_cheese + hs_milk + hs_yogurt+ hs_probiotic+ hs_desert) | factor | Tertiles | Dairy | Dairy |
| hs_fastfood_Ter | hs_fastfood_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Visits a fast food restaurant/take away | factor | Tertiles | Fastfood | Fastfood |
| hs_KIDMED_None | hs_KIDMED_None | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Sum of KIDMED indices, without index9 | numeric | None | KIDMED | KIDMED |
| hs_mvpa_prd_alt_None | hs_mvpa_prd_alt_None | Lifestyles | Lifestyle | Physical activity | Postnatal | NA | NA | Clean & Over-reporting of Moderate-to-Vigorous Physical Activity (min/day) | numeric | None | PA | PA |
| hs_org_food_Ter | hs_org_food_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Eats organic food | factor | Tertiles | Organicfood | Organicfood |
| hs_proc_meat_Ter | hs_proc_meat_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: processed meat (hs_coldmeat+hs_ham) | factor | Tertiles | Processed meat | ProcMeat |
| hs_readymade_Ter | hs_readymade_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Eats a æready-made supermarket meal | factor | Tertiles | Ready made food | ReadyFood |
| hs_sd_wk_None | hs_sd_wk_None | Lifestyles | Lifestyle | Physical activity | Postnatal | NA | NA | sedentary behaviour (min/day) | numeric | None | Sedentary | Sedentary |
| hs_total_bread_Ter | hs_total_bread_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: bread (hs_darkbread+hs_whbread) | factor | Tertiles | Bread | Bread |
| hs_total_cereal_Ter | hs_total_cereal_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: cereal (hs_darkbread + hs_whbread + hs_rice_pasta + hs_sugarcer + hs_othcer + hs_rusks) | factor | Tertiles | Cereals | Cereals |
| hs_total_fish_Ter | hs_total_fish_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: fish and seafood (hs_canfish+hs_oilyfish+hs_whfish+hs_seafood) | factor | Tertiles | Fish | Fish |
| hs_total_fruits_Ter | hs_total_fruits_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: fruits (hs_canfruit+hs_dryfruit+hs_freshjuice+hs_fruits) | factor | Tertiles | Fruits | Fruits |
| hs_total_lipids_Ter | hs_total_lipids_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: Added fat | factor | Tertiles | Diet fat | Diet fat |
| hs_total_meat_Ter | hs_total_meat_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: meat (hs_coldmeat+hs_ham+hs_poultry+hs_redmeat) | factor | Tertiles | Meat | Meat |
| hs_total_potatoes_Ter | hs_total_potatoes_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: potatoes (hs_frenchfries+hs_potatoes) | factor | Tertiles | Potatoes | Potatoes |
| hs_total_sweets_Ter | hs_total_sweets_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: sweets (hs_choco + hs_sweets + hs_sugar) | factor | Tertiles | Sweets | Sweets |
| hs_total_veg_Ter | hs_total_veg_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: vegetables (hs_cookveg+hs_rawveg) | factor | Tertiles | Vegetables | Vegetables |
| hs_total_yog_Ter | hs_total_yog_Ter | Lifestyles | Lifestyle | Diet | Postnatal | NA | NA | Food group: yogurt (hs_yogurt+hs_probiotic) | factor | Tertiles | Yogurt | Yogurt |
| hs_dif_hours_total_None | hs_dif_hours_total_None | Lifestyles | Lifestyle | Sleep | Postnatal | NA | NA | Total hours of sleep (mean weekdays and night) | numeric | None | Sleep | Sleep |
| hs_as_c_Log2 | hs_as_c_Log2 | Chemicals | Metals | As | Postnatal | NA | NA | Arsenic (As) in child | numeric | Logarithm base 2 | As | As |
| hs_cd_c_Log2 | hs_cd_c_Log2 | Chemicals | Metals | Cd | Postnatal | NA | NA | Cadmium (Cd) in child | numeric | Logarithm base 2 | Cd | Cd |
| hs_co_c_Log2 | hs_co_c_Log2 | Chemicals | Metals | Co | Postnatal | NA | NA | Cobalt (Co) in child | numeric | Logarithm base 2 | Co | Co |
| hs_cs_c_Log2 | hs_cs_c_Log2 | Chemicals | Metals | Cs | Postnatal | NA | NA | Caesium (Cs) in child | numeric | Logarithm base 2 | Cs | Cs |
| hs_cu_c_Log2 | hs_cu_c_Log2 | Chemicals | Metals | Cu | Postnatal | NA | NA | Copper (Cu) in child | numeric | Logarithm base 2 | Cu | Cu |
| hs_hg_c_Log2 | hs_hg_c_Log2 | Chemicals | Metals | Hg | Postnatal | NA | NA | Mercury (Hg) in child | numeric | Logarithm base 2 | Hg | Hg |
| hs_mn_c_Log2 | hs_mn_c_Log2 | Chemicals | Metals | Mn | Postnatal | NA | NA | Manganese (Mn) in child | numeric | Logarithm base 2 | Mn | Mn |
| hs_mo_c_Log2 | hs_mo_c_Log2 | Chemicals | Metals | Mo | Postnatal | NA | NA | Molybdenum (Mo) in child | numeric | Logarithm base 2 | Mo | Mo |
| hs_pb_c_Log2 | hs_pb_c_Log2 | Chemicals | Metals | Pb | Postnatal | NA | NA | Lead (Pb) in child | numeric | Logarithm base 2 | Pb | Pb |
| hs_tl_cdich_None | hs_tl_cdich_None | Chemicals | Metals | Tl | Postnatal | NA | NA | Dichotomous variable of thallium (Tl) in child | factor | None | Tl | Tl |
| hs_dde_cadj_Log2 | hs_dde_cadj_Log2 | Chemicals | Organochlorines | DDE | Postnatal | NA | NA | Dichlorodiphenyldichloroethylene (DDE) in child adjusted for lipids | numeric | Logarithm base 2 | DDE | DDE |
| hs_ddt_cadj_Log2 | hs_ddt_cadj_Log2 | Chemicals | Organochlorines | DDT | Postnatal | NA | NA | Dichlorodiphenyltrichloroethane (DDT) in child adjusted for lipids | numeric | Logarithm base 2 | DDT | DDT |
| hs_hcb_cadj_Log2 | hs_hcb_cadj_Log2 | Chemicals | Organochlorines | HCB | Postnatal | NA | NA | Hexachlorobenzene (HCB) in child adjusted for lipids | numeric | Logarithm base 2 | HCB | HCB |
| hs_pcb118_cadj_Log2 | hs_pcb118_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl -118 (PCB-118) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 118 | PCB118 |
| hs_pcb138_cadj_Log2 | hs_pcb138_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-138 (PCB-138) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 138 | PCB138 |
| hs_pcb153_cadj_Log2 | hs_pcb153_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-153 (PCB-153) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 153 | PCB153 |
| hs_pcb170_cadj_Log2 | hs_pcb170_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-170 (PCB-170) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 170 | PCB170 |
| hs_pcb180_cadj_Log2 | hs_pcb180_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Polychlorinated biphenyl-180 (PCB-180) in child adjusted for lipids | numeric | Logarithm base 2 | PCB 180 | PCB180 |
| hs_sumPCBs5_cadj_Log2 | hs_sumPCBs5_cadj_Log2 | Chemicals | Organochlorines | PCBs | Postnatal | NA | NA | Sum of PCBs in child adjusted for lipids (4 cohorts) | numeric | Logarithm base 2 | PCBs | SumPCB |
| hs_dep_cadj_Log2 | hs_dep_cadj_Log2 | Chemicals | Organophosphate pesticides | DEP | Postnatal | NA | NA | Diethyl phosphate (DEP) in child adjusted for creatinine | numeric | Logarithm base 2 | DEP | DEP |
| hs_detp_cadj_Log2 | hs_detp_cadj_Log2 | Chemicals | Organophosphate pesticides | DETP | Postnatal | NA | NA | Diethyl thiophosphate (DETP) in child adjusted for creatinine | numeric | Logarithm base 2 | DETP | DETP |
| hs_dmdtp_cdich_None | hs_dmdtp_cdich_None | Chemicals | Organophosphate pesticides | DMDTP | Postnatal | NA | NA | Dichotomous variable of dimethyl dithiophosphate (DMDTP) in child | factor | None | DMDTP | DMDTP |
| hs_dmp_cadj_Log2 | hs_dmp_cadj_Log2 | Chemicals | Organophosphate pesticides | DMP | Postnatal | NA | NA | Dimethyl phosphate (DMP) in child adjusted for creatinine | numeric | Logarithm base 2 | DMP | DMP |
| hs_dmtp_cadj_Log2 | hs_dmtp_cadj_Log2 | Chemicals | Organophosphate pesticides | DMTP | Postnatal | NA | NA | Dimethyl thiophosphate (DMTP) in child adjusted for creatinine | numeric | Logarithm base 2 | DMDTP | DMTP |
| hs_pbde153_cadj_Log2 | hs_pbde153_cadj_Log2 | Chemicals | Polybrominated diphenyl ethers (PBDE) | PBDE153 | Postnatal | NA | NA | Polybrominated diphenyl ether-153 (PBDE-153) in child adjusted for lipids | numeric | Logarithm base 2 | PBDE 153 | PBDE153 |
| hs_pbde47_cadj_Log2 | hs_pbde47_cadj_Log2 | Chemicals | Polybrominated diphenyl ethers (PBDE) | PBDE47 | Postnatal | NA | NA | Polybrominated diphenyl ether-47 (PBDE-47) in child adjusted for lipids | numeric | Logarithm base 2 | PBDE 47 | PBDE47 |
| hs_pfhxs_c_Log2 | hs_pfhxs_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFHXS | Postnatal | NA | NA | Perfluorohexane sulfonate (PFHXS) in child | numeric | Logarithm base 2 | PFHXS | PFHXS |
| hs_pfna_c_Log2 | hs_pfna_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFNA | Postnatal | NA | NA | Perfluorononanoate (PFNA) in child | numeric | Logarithm base 2 | PFNA | PFNA |
| hs_pfoa_c_Log2 | hs_pfoa_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFOA | Postnatal | NA | NA | Perfluorooctanoate (PFOA) in child | numeric | Logarithm base 2 | PFOA | PFOA |
| hs_pfos_c_Log2 | hs_pfos_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFOS | Postnatal | NA | NA | Perfluorooctane sulfonate (PFOS) in child | numeric | Logarithm base 2 | PFOS | PFOS |
| hs_pfunda_c_Log2 | hs_pfunda_c_Log2 | Chemicals | Per- and polyfluoroalkyl substances (PFAS) | PFUNDA | Postnatal | NA | NA | Perfluoroundecanoate (PFUNDA) in child | numeric | Logarithm base 2 | PFUNDA | PFUNDA |
| hs_bpa_cadj_Log2 | hs_bpa_cadj_Log2 | Chemicals | Phenols | BPA | Postnatal | NA | NA | Bisphenol A (BPA) in child adjusted for creatinine | numeric | Logarithm base 2 | BPA | BPA |
| hs_bupa_cadj_Log2 | hs_bupa_cadj_Log2 | Chemicals | Phenols | BUPA | Postnatal | NA | NA | N-Butyl paraben (BUPA) in child adjusted for creatinine | numeric | Logarithm base 2 | BUPA | BUPA |
| hs_etpa_cadj_Log2 | hs_etpa_cadj_Log2 | Chemicals | Phenols | ETPA | Postnatal | NA | NA | Ethyl paraben (ETPA) in child adjusted for creatinine | numeric | Logarithm base 2 | ETPA | ETPA |
| hs_mepa_cadj_Log2 | hs_mepa_cadj_Log2 | Chemicals | Phenols | MEPA | Postnatal | NA | NA | Methyl paraben (MEPA) in child adjusted for creatinine | numeric | Logarithm base 2 | MEPA | MEPA |
| hs_oxbe_cadj_Log2 | hs_oxbe_cadj_Log2 | Chemicals | Phenols | OXBE | Postnatal | NA | NA | Oxybenzone (OXBE) in child adjusted for creatinine | numeric | Logarithm base 2 | OXBE | OXBE |
| hs_prpa_cadj_Log2 | hs_prpa_cadj_Log2 | Chemicals | Phenols | PRPA | Postnatal | NA | NA | Propyl paraben (PRPA) in child adjusted for creatinine | numeric | Logarithm base 2 | PRPA | PRPA |
| hs_trcs_cadj_Log2 | hs_trcs_cadj_Log2 | Chemicals | Phenols | TRCS | Postnatal | NA | NA | Triclosan (TRCS) in child adjusted for creatinine | numeric | Logarithm base 2 | TRCS | TRCS |
| hs_mbzp_cadj_Log2 | hs_mbzp_cadj_Log2 | Chemicals | Phthalates | MBZP | Postnatal | NA | NA | Mono benzyl phthalate (MBzP) in child adjusted for creatinine | numeric | Logarithm base 2 | MBZP | MBZP |
| hs_mecpp_cadj_Log2 | hs_mecpp_cadj_Log2 | Chemicals | Phthalates | MECPP | Postnatal | NA | NA | Mono-2-ethyl 5-carboxypentyl phthalate (MECPP) in child adjusted for creatinine | numeric | Logarithm base 2 | MECPP | MECPP |
| hs_mehhp_cadj_Log2 | hs_mehhp_cadj_Log2 | Chemicals | Phthalates | MEHHP | Postnatal | NA | NA | Mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEHHP | MEHHP |
| hs_mehp_cadj_Log2 | hs_mehp_cadj_Log2 | Chemicals | Phthalates | MEHP | Postnatal | NA | NA | Mono-2-ethylhexyl phthalate (MEHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEHP | MEHP |
| hs_meohp_cadj_Log2 | hs_meohp_cadj_Log2 | Chemicals | Phthalates | MEOHP | Postnatal | NA | NA | Mono-2-ethyl-5-oxohexyl phthalate (MEOHP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEOHP | MEOHP |
| hs_mep_cadj_Log2 | hs_mep_cadj_Log2 | Chemicals | Phthalates | MEP | Postnatal | NA | NA | Monoethyl phthalate (MEP) in child adjusted for creatinine | numeric | Logarithm base 2 | MEP | MEP |
| hs_mibp_cadj_Log2 | hs_mibp_cadj_Log2 | Chemicals | Phthalates | MIBP | Postnatal | NA | NA | Mono-iso-butyl phthalate (MiBP) in child adjusted for creatinine | numeric | Logarithm base 2 | MIBP | MIBP |
| hs_mnbp_cadj_Log2 | hs_mnbp_cadj_Log2 | Chemicals | Phthalates | MNBP | Postnatal | NA | NA | Mono-n-butyl phthalate (MnBP) in child adjusted for creatinine | numeric | Logarithm base 2 | MNBP | MNBP |
| hs_ohminp_cadj_Log2 | hs_ohminp_cadj_Log2 | Chemicals | Phthalates | OHMiNP | Postnatal | NA | NA | Mono-4-methyl-7-hydroxyoctyl phthalate (OHMiNP) in child adjusted for creatinine | numeric | Logarithm base 2 | OHMiNP | OHMiNP |
| hs_oxominp_cadj_Log2 | hs_oxominp_cadj_Log2 | Chemicals | Phthalates | OXOMINP | Postnatal | NA | NA | Mono-4-methyl-7-oxooctyl phthalate (OXOMiNP) in child adjusted for creatinine | numeric | Logarithm base 2 | OXOMINP | OXOMINP |
| hs_sumDEHP_cadj_Log2 | hs_sumDEHP_cadj_Log2 | Chemicals | Phthalates | DEHP | Postnatal | NA | NA | Sum of DEHP metabolites (µg/g) in child adjusted for creatinine | numeric | Logarithm base 2 | DEHP | SumDEHP |
| FAS_cat_None | FAS_cat_None | Chemicals | Social and economic capital | Economic capital | Postnatal | NA | NA | Family affluence score | factor | None | Family affluence | FamAfl |
| hs_contactfam_3cat_num_None | hs_contactfam_3cat_num_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | scoial capital: family friends | factor | None | Social contact | SocCont |
| hs_hm_pers_None | hs_hm_pers_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | How many people live in your home? | numeric | None | House crowding | HouseCrow |
| hs_participation_3cat_None | hs_participation_3cat_None | Chemicals | Social and economic capital | Social capital | Postnatal | NA | NA | social capital: structural | factor | None | Social participation | SocPartic |
| hs_cotinine_cdich_None | hs_cotinine_cdich_None | Chemicals | Tobacco Smoke | Cotinine | Postnatal | NA | NA | Dichotomous variable of cotinine in child | factor | None | Cotinine | Cotinine |
| hs_globalexp2_None | hs_globalexp2_None | Chemicals | Tobacco Smoke | Tobacco Smoke | Postnatal | NA | NA | Global exposure of the child to ETS (2 categories) | factor | None | ETS | ETS |
| hs_smk_parents_None | hs_smk_parents_None | Chemicals | Tobacco Smoke | Tobacco Smoke | Postnatal | NA | NA | Tobacco Smoke status of parents (both) | factor | None | Smoking_parents | SmokPar |
| e3_sex_None | e3_sex_None | Covariates | Covariates | Child covariate | Pregnancy | NA | NA | Child sex (female / male) | factor | None | Child sex | Sex |
| e3_yearbir_None | e3_yearbir_None | Covariates | Covariates | Child covariate | Pregnancy | NA | NA | Year of birth (2003 to 2009) | factor | None | Year of birth | YearBirth |
| h_cohort | h_cohort | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Cohort of inclusion (1 to 6) | factor | None | Cohort | Cohort |
| h_edumc_None | h_edumc_None | Covariates | Covariates | Maternal covariate | Pregnancy | NA | NA | Maternal education (1: primary school, 2:secondary school, 3:university degree or higher) | factor | None | Maternal education | mEducation |
| hs_child_age_None | hs_child_age_None | Covariates | Covariates | Child covariate | Postnatal | NA | NA | Child age at examination (years) | numeric | None | Child age | cAge |
| hs_zbmi_who | hs_zbmi_who | Phenotype | Phenotype | Outcome at 6-11 years old | Postnatal | NA | NA | Body mass index z-score at 6-11 years old - WHO reference - Standardized on sex and age | numeric | None | Body mass index z-score | zBMI |
# specific lifestyle exposures
lifestyle_exposures <- c(
"h_bfdur_Ter",
"hs_bakery_prod_Ter",
"hs_break_cer_Ter",
"hs_dairy_Ter",
"hs_fastfood_Ter",
"hs_org_food_Ter",
"hs_proc_meat_Ter",
"hs_total_fish_Ter",
"hs_total_fruits_Ter",
"hs_total_lipids_Ter",
"hs_total_sweets_Ter",
"hs_total_veg_Ter"
)
lifestyle_exposome <- dplyr::select(exposome, all_of(lifestyle_exposures))
summarytools::view(dfSummary(lifestyle_exposome, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | h_bfdur_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 2 | hs_bakery_prod_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 3 | hs_break_cer_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 4 | hs_dairy_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 5 | hs_fastfood_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 6 | hs_org_food_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 7 | hs_proc_meat_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 8 | hs_total_fish_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 9 | hs_total_fruits_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 10 | hs_total_lipids_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 11 | hs_total_sweets_Ter [factor] |
|
|
0 (0.0%) | ||||||||||||||||
| 12 | hs_total_veg_Ter [factor] |
|
|
0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-09
categorical_lifestyle <- lifestyle_exposome %>%
dplyr::select(where(is.factor))
categorical_lifestyle_long <- pivot_longer(
categorical_lifestyle,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_categorical_vars <- unique(categorical_lifestyle_long$variable)
categorical_plots <- lapply(unique_categorical_vars, function(var) {
data <- filter(categorical_lifestyle_long, variable == var)
p <- ggplot(data, aes(x = value, fill = value)) +
geom_bar(stat = "count") +
labs(title = paste("Distribution of", var), x = var, y = "Count")
print(p)
return(p)
})
Breastfeeding Duration: Majority of observations are in the highest duration category, suggesting longer breastfeeding periods are common.
Bakery Products: Shows a relatively even distribution across the three categories, indicating varied consumption levels of bakery products among participants.
Breakfast Cereal: The highest category of cereal consumption is the most common, suggesting a preference for or greater consumption of cereals.
Dairy: Shows a fairly even distribution across all categories, indicating a uniform consumption pattern of dairy products.
Fast Food: Most participants fall into the middle category, indicating moderate consumption of fast food.
Organic Food: Most participants either consume a lot of or no organic food, with fewer in the middle range.
Processed Meat: Consumption levels are fairly evenly distributed, indicating varied dietary habits regarding processed meats.
Bread: Distribution shows a significant leaning towards higher bread consumption.
Cereal: Even distribution across categories suggests varied cereal consumption habits.
Fish and Seafood: Even distribution across categories, indicating varied consumption of fish and seafood.
Fruits: High fruit consumption is the most common, with fewer participants in the lowest category.
Added Fats: More participants consume added fats at the lowest and highest levels, with fewer in the middle.
Sweets: High consumption of sweets is the most common, indicating a preference for or higher access to sugary foods.
Vegetables: Most participants consume a high amount of vegetables.
# specific chemical exposures
chemical_exposures <- c(
"hs_cd_c_Log2",
"hs_co_c_Log2",
"hs_cs_c_Log2",
"hs_cu_c_Log2",
"hs_hg_c_Log2",
"hs_mo_c_Log2",
"hs_pb_c_Log2",
"hs_dde_cadj_Log2",
"hs_pcb153_cadj_Log2",
"hs_pcb170_cadj_Log2",
"hs_dep_cadj_Log2",
"hs_pbde153_cadj_Log2",
"hs_pfhxs_c_Log2",
"hs_pfoa_c_Log2",
"hs_pfos_c_Log2",
"hs_prpa_cadj_Log2",
"hs_mbzp_cadj_Log2",
"hs_mibp_cadj_Log2",
"hs_mnbp_cadj_Log2"
)
chemical_exposome <- dplyr::select(exposome, all_of(chemical_exposures))
summarytools::view(dfSummary(chemical_exposome, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | hs_cd_c_Log2 [numeric] |
|
695 distinct values | 0 (0.0%) | |||||
| 2 | hs_co_c_Log2 [numeric] |
|
317 distinct values | 0 (0.0%) | |||||
| 3 | hs_cs_c_Log2 [numeric] |
|
369 distinct values | 0 (0.0%) | |||||
| 4 | hs_cu_c_Log2 [numeric] |
|
345 distinct values | 0 (0.0%) | |||||
| 5 | hs_hg_c_Log2 [numeric] |
|
698 distinct values | 0 (0.0%) | |||||
| 6 | hs_mo_c_Log2 [numeric] |
|
593 distinct values | 0 (0.0%) | |||||
| 7 | hs_pb_c_Log2 [numeric] |
|
529 distinct values | 0 (0.0%) | |||||
| 8 | hs_dde_cadj_Log2 [numeric] |
|
1050 distinct values | 0 (0.0%) | |||||
| 9 | hs_pcb153_cadj_Log2 [numeric] |
|
1047 distinct values | 0 (0.0%) | |||||
| 10 | hs_pcb170_cadj_Log2 [numeric] |
|
1039 distinct values | 0 (0.0%) | |||||
| 11 | hs_dep_cadj_Log2 [numeric] |
|
1045 distinct values | 0 (0.0%) | |||||
| 12 | hs_pbde153_cadj_Log2 [numeric] |
|
1036 distinct values | 0 (0.0%) | |||||
| 13 | hs_pfhxs_c_Log2 [numeric] |
|
1061 distinct values | 0 (0.0%) | |||||
| 14 | hs_pfoa_c_Log2 [numeric] |
|
1061 distinct values | 0 (0.0%) | |||||
| 15 | hs_pfos_c_Log2 [numeric] |
|
1050 distinct values | 0 (0.0%) | |||||
| 16 | hs_prpa_cadj_Log2 [numeric] |
|
1031 distinct values | 0 (0.0%) | |||||
| 17 | hs_mbzp_cadj_Log2 [numeric] |
|
1046 distinct values | 0 (0.0%) | |||||
| 18 | hs_mibp_cadj_Log2 [numeric] |
|
1057 distinct values | 0 (0.0%) | |||||
| 19 | hs_mnbp_cadj_Log2 [numeric] |
|
1048 distinct values | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-09
#separate numeric and categorical data
numeric_chemical <- chemical_exposome %>%
dplyr::select(where(is.numeric))
numeric_chemical_long <- pivot_longer(
numeric_chemical,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_numerical_vars <- unique(numeric_chemical_long$variable)
num_plots <- lapply(unique_numerical_vars, function(var) {
data <- filter(numeric_chemical_long, variable == var)
p <- ggplot(data, aes(x = value)) +
geom_histogram(bins = 30, fill = "blue") +
labs(title = paste("Histogram of", var), x = "Value", y = "Count")
print(p)
return(p)
})
Cadmium (hs_cd_c_Log2): The distribution of cadmium levels is skewed to the right, indicating that most participants have lower exposure levels, with a few cases showing significantly higher exposures.
Cobalt (hs_co_c_Log2): The histogram of cobalt levels displays a roughly normal distribution centered around a slight positive skew. This suggests a common source of exposure with varying levels among the population.
Cesium (hs_cs_c_Log2): Exhibits a right-skewed distribution, indicating that most participants have relatively low exposure levels, but a small number have substantially higher exposures.
Copper (hs_cu_c_Log2): Shows a right-skewed distribution, suggesting that while most individuals have moderate exposure, a few experience significantly higher levels of copper.
Mercury (hs_hg_c_Log2): This distribution is also right-skewed, common for environmental pollutants, where a majority have lower exposure levels, and a minority have high exposure levels.
Molybdenum (hs_mo_c_Log2): Shows a distribution with a sharp peak and a long right tail, suggesting that while most people have similar exposure levels, a few have exceptionally high exposures.
Lead (hs_pb_c_Log2): The distribution is slightly right-skewed, indicating higher exposure levels in a smaller group of the population compared to the majority.
DDE (hs_dde_cadj_Log2): Shows a pronounced right skew, typical for chemicals that accumulate in the environment and in human tissues, indicating higher levels of exposure in a smaller subset of the population..
PCB 153 (hs_pcb153_cadj_Log2): Has a distribution with right skewness, suggesting that exposure to these compounds is higher among a smaller segment of the population.
PCB 170 (hs_pcb170_cadj_Log2): This histograms show a significant right skew, indicating lower concentrations of these chemicals in most samples, with fewer samples showing higher concentrations. This pattern suggests that while most individuals have low exposure, a few may have considerably higher levels.
DEP and PBDE 153: These histograms mostly show multimodal distributions (more than one peak), suggesting different exposure sources or groups within the population that have distinct exposure levels. The multiple peaks could indicate varied exposure pathways or differences in how these chemicals are metabolized or retained in the body.
PFHxS and PFOA: These perfluorinated compounds display a roughly normal distribution skewed right, suggesting a common source of exposure among the population, but with some individuals experiencing higher exposures.
PFOS and PFUnDA: The histograms show a single, sharp peak with a rapid decline, indicating that most individuals have similar exposure levels, likely due to common environmental sources or regulatory controls limiting variability.
MBZP (Monobenzyl Phthalate): This histogram shows a right-skewed distribution. Most values cluster at the lower end, indicating a common lower exposure level among subjects, with a long tail towards higher values suggesting occasional higher exposures.
MECPP (Mono-ethyl hexyl phthalate): The distribution is right-skewed, similar to MBZP, but with a smoother decline. This pattern also indicates that while most subjects have lower exposure levels, a few experience significantly higher exposures.
MEHHP (Mono-2-ethyl-5-hydroxyhexyl phthalate): Exhibits a unimodal distribution with a peak around a middle value and symmetric tails. This could indicate a more standardized exposure level among the subjects with some variation.
MEHP (Mono-ethylhexyl phthalate):Another right-skewed distribution, indicating that most subjects have lower exposure levels but a few have much higher levels.
MEOHP (Mono-2-ethyl-5-oxohexyl phthalate): This histogram shows a distribution with a peak around the middle values and a tail extending towards higher values, suggesting a central tendency with some higher exposures.
MEP (Mono-ethyl phthalate): The distribution is right-skewed, similar to others, showing most subjects with low to moderate levels of exposure, but a few have much higher levels.
numeric_chemical <- select_if(chemical_exposome, is.numeric)
cor_matrix <- cor(numeric_chemical, method = "pearson")
cor_matrix <- cor(numeric_chemical, method = "spearman")
custom_color_scale <- list(
c(0, "darkred"),
c(0.5, "white"),
c(1, "darkblue")
)
plot_ly(
z = cor_matrix,
x = colnames(cor_matrix),
y = colnames(cor_matrix),
type = "heatmap",
colorscale = custom_color_scale
) %>%
layout(
title = "Correlation Matrix",
xaxis = list(tickangle = -90),
yaxis = list(side = "left")
)
# Specified covariates
specific_covariates <- c(
"e3_sex_None",
"e3_yearbir_None",
"h_edumc_None",
"h_cohort",
"hs_child_age_None"
)
covariate_data <- dplyr::select(covariates, all_of(specific_covariates))
summarytools::view(dfSummary(covariate_data, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | e3_sex_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 2 | e3_yearbir_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 3 | h_edumc_None [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 4 | h_cohort [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||
| 5 | hs_child_age_None [numeric] |
|
879 distinct values | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-09
#separate numeric and categorical data
numeric_covariates <- covariate_data %>%
dplyr::select(where(is.numeric))
numeric_covariates_long <- pivot_longer(
numeric_covariates,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_numerical_vars <- unique(numeric_covariates_long$variable)
num_plots <- lapply(unique_numerical_vars, function(var) {
data <- filter(numeric_covariates_long, variable == var)
p <- ggplot(data, aes(x = value)) +
geom_histogram(bins = 30, fill = "blue") +
labs(title = paste("Histogram of", var), x = "Value", y = "Count")
print(p)
return(p)
})
Child’s Age (hs_child_age): This histogram is multimodal, reflecting several peaks across different ages. This could be indicative of the data collection points or particular age groups being studied.
categorical_covariates <- covariate_data %>%
dplyr::select(where(is.factor))
categorical_covariates_long <- pivot_longer(
categorical_covariates,
cols = everything(),
names_to = "variable",
values_to = "value"
)
unique_categorical_vars <- unique(categorical_covariates_long$variable)
categorical_plots <- lapply(unique_categorical_vars, function(var) {
data <- filter(categorical_covariates_long, variable == var)
p <- ggplot(data, aes(x = value, fill = value)) +
geom_bar(stat = "count") +
labs(title = paste("Distribution of", var), x = var, y = "Count")
print(p)
return(p)
})
Cohorts (h_cohort): The distribution shows the count of subjects across six different cohorts. All cohorts have a substantial number of subjects, with cohort 5 showing the highest participation.
Gender Distribution (e3_sex): The gender distribution is nearly balanced with a slight higher count for males compared to females.
Year of Birth (e3_yearbir): This chart shows that the majority of subjects were born in the later years, with a significant increase in 2009, indicating perhaps a larger recruitment or a specific cohort focus that year.
Educational Level (h_educmc): Represents three categories of educational attainment, with category 3 having the highest count, suggesting a higher level of education among the majority of the subjects.
outcome_BMI <- phenotype %>%
dplyr::select(hs_zbmi_who)
summarytools::view(dfSummary(outcome_BMI, style = 'grid', plain.ascii = FALSE, valid.col = FALSE, headings = FALSE), method = "render")
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | hs_zbmi_who [numeric] |
|
421 distinct values | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.4.0)
2024-07-09
# Combine all selected data
combined_data <- cbind(covariate_data, lifestyle_exposome, chemical_exposome, outcome_BMI)
# Ensure no duplicated columns
combined_data <- combined_data[, !duplicated(colnames(combined_data))]
# Convert sex variable to a factor for stratification
combined_data$e3_sex_None <- as.factor(combined_data$e3_sex_None)
levels(combined_data$e3_sex_None) <- c("Male", "Female")
render_cont <- function(x) {
with(stats.default(x), sprintf("%0.2f (%0.2f)", MEAN, SD))
}
render_cat <- function(x) {
c("", sapply(stats.default(x), function(y) with(y, sprintf("%d (%0.1f %%)", FREQ, PCT))))
}
# Define the formula for table1
table1_formula <- ~
hs_child_age_None + e3_yearbir_None + h_edumc_None + h_cohort +
hs_zbmi_who +
h_bfdur_Ter + hs_bakery_prod_Ter + hs_break_cer_Ter + hs_dairy_Ter + hs_fastfood_Ter + hs_org_food_Ter +
hs_proc_meat_Ter +
hs_total_fish_Ter + hs_total_fruits_Ter + hs_total_lipids_Ter + hs_total_sweets_Ter + hs_total_veg_Ter +
hs_cd_c_Log2 + hs_co_c_Log2 + hs_cs_c_Log2 + hs_cu_c_Log2 +
hs_hg_c_Log2 + hs_mo_c_Log2 + hs_dde_cadj_Log2 + hs_pcb153_cadj_Log2 +
hs_pcb170_cadj_Log2 + hs_dep_cadj_Log2 + hs_pbde153_cadj_Log2 +
hs_pfhxs_c_Log2 + hs_pfoa_c_Log2 + hs_pfos_c_Log2 + hs_prpa_cadj_Log2 +
hs_mbzp_cadj_Log2 + hs_mibp_cadj_Log2 + hs_mnbp_cadj_Log2 | e3_sex_None
# Create the table
table1(
table1_formula,
data = combined_data,
render.continuous = render_cont,
render.categorical = render_cat,
overall = TRUE,
topclass = "Rtable1-shade"
)
| Male (N=608) |
Female (N=693) |
TRUE (N=1301) |
|
|---|---|---|---|
| hs_child_age_None | 7.91 (1.58) | 8.03 (1.64) | 7.98 (1.61) |
| e3_yearbir_None | |||
| 2003 | 25 (4.1 %) | 30 (4.3 %) | 55 (4.2 %) |
| 2004 | 46 (7.6 %) | 61 (8.8 %) | 107 (8.2 %) |
| 2005 | 121 (19.9 %) | 120 (17.3 %) | 241 (18.5 %) |
| 2006 | 108 (17.8 %) | 148 (21.4 %) | 256 (19.7 %) |
| 2007 | 128 (21.1 %) | 122 (17.6 %) | 250 (19.2 %) |
| 2008 | 177 (29.1 %) | 202 (29.1 %) | 379 (29.1 %) |
| 2009 | 3 (0.5 %) | 10 (1.4 %) | 13 (1.0 %) |
| h_edumc_None | |||
| 1 | 96 (15.8 %) | 82 (11.8 %) | 178 (13.7 %) |
| 2 | 195 (32.1 %) | 254 (36.7 %) | 449 (34.5 %) |
| 3 | 317 (52.1 %) | 357 (51.5 %) | 674 (51.8 %) |
| h_cohort | |||
| 1 | 97 (16.0 %) | 105 (15.2 %) | 202 (15.5 %) |
| 2 | 86 (14.1 %) | 112 (16.2 %) | 198 (15.2 %) |
| 3 | 102 (16.8 %) | 122 (17.6 %) | 224 (17.2 %) |
| 4 | 93 (15.3 %) | 114 (16.5 %) | 207 (15.9 %) |
| 5 | 129 (21.2 %) | 143 (20.6 %) | 272 (20.9 %) |
| 6 | 101 (16.6 %) | 97 (14.0 %) | 198 (15.2 %) |
| hs_zbmi_who | 0.35 (1.15) | 0.45 (1.22) | 0.40 (1.19) |
| h_bfdur_Ter | |||
| (0,10.8] | 231 (38.0 %) | 275 (39.7 %) | 506 (38.9 %) |
| (10.8,34.9] | 118 (19.4 %) | 152 (21.9 %) | 270 (20.8 %) |
| (34.9,Inf] | 259 (42.6 %) | 266 (38.4 %) | 525 (40.4 %) |
| hs_bakery_prod_Ter | |||
| (0,2] | 164 (27.0 %) | 181 (26.1 %) | 345 (26.5 %) |
| (2,6] | 188 (30.9 %) | 235 (33.9 %) | 423 (32.5 %) |
| (6,Inf] | 256 (42.1 %) | 277 (40.0 %) | 533 (41.0 %) |
| hs_break_cer_Ter | |||
| (0,1.1] | 141 (23.2 %) | 150 (21.6 %) | 291 (22.4 %) |
| (1.1,5.5] | 251 (41.3 %) | 270 (39.0 %) | 521 (40.0 %) |
| (5.5,Inf] | 216 (35.5 %) | 273 (39.4 %) | 489 (37.6 %) |
| hs_dairy_Ter | |||
| (0,14.6] | 175 (28.8 %) | 184 (26.6 %) | 359 (27.6 %) |
| (14.6,25.6] | 229 (37.7 %) | 236 (34.1 %) | 465 (35.7 %) |
| (25.6,Inf] | 204 (33.6 %) | 273 (39.4 %) | 477 (36.7 %) |
| hs_fastfood_Ter | |||
| (0,0.132] | 75 (12.3 %) | 68 (9.8 %) | 143 (11.0 %) |
| (0.132,0.5] | 273 (44.9 %) | 330 (47.6 %) | 603 (46.3 %) |
| (0.5,Inf] | 260 (42.8 %) | 295 (42.6 %) | 555 (42.7 %) |
| hs_org_food_Ter | |||
| (0,0.132] | 211 (34.7 %) | 218 (31.5 %) | 429 (33.0 %) |
| (0.132,1] | 191 (31.4 %) | 205 (29.6 %) | 396 (30.4 %) |
| (1,Inf] | 206 (33.9 %) | 270 (39.0 %) | 476 (36.6 %) |
| hs_proc_meat_Ter | |||
| (0,1.5] | 175 (28.8 %) | 191 (27.6 %) | 366 (28.1 %) |
| (1.5,4] | 227 (37.3 %) | 244 (35.2 %) | 471 (36.2 %) |
| (4,Inf] | 206 (33.9 %) | 258 (37.2 %) | 464 (35.7 %) |
| hs_total_fish_Ter | |||
| (0,1.5] | 183 (30.1 %) | 206 (29.7 %) | 389 (29.9 %) |
| (1.5,3] | 224 (36.8 %) | 230 (33.2 %) | 454 (34.9 %) |
| (3,Inf] | 201 (33.1 %) | 257 (37.1 %) | 458 (35.2 %) |
| hs_total_fruits_Ter | |||
| (0,7] | 174 (28.6 %) | 239 (34.5 %) | 413 (31.7 %) |
| (7,14.1] | 216 (35.5 %) | 191 (27.6 %) | 407 (31.3 %) |
| (14.1,Inf] | 218 (35.9 %) | 263 (38.0 %) | 481 (37.0 %) |
| hs_total_lipids_Ter | |||
| (0,3] | 193 (31.7 %) | 204 (29.4 %) | 397 (30.5 %) |
| (3,7] | 171 (28.1 %) | 232 (33.5 %) | 403 (31.0 %) |
| (7,Inf] | 244 (40.1 %) | 257 (37.1 %) | 501 (38.5 %) |
| hs_total_sweets_Ter | |||
| (0,4.1] | 149 (24.5 %) | 195 (28.1 %) | 344 (26.4 %) |
| (4.1,8.5] | 251 (41.3 %) | 265 (38.2 %) | 516 (39.7 %) |
| (8.5,Inf] | 208 (34.2 %) | 233 (33.6 %) | 441 (33.9 %) |
| hs_total_veg_Ter | |||
| (0,6] | 190 (31.2 %) | 214 (30.9 %) | 404 (31.1 %) |
| (6,8.5] | 136 (22.4 %) | 178 (25.7 %) | 314 (24.1 %) |
| (8.5,Inf] | 282 (46.4 %) | 301 (43.4 %) | 583 (44.8 %) |
| hs_cd_c_Log2 | -3.99 (0.98) | -3.95 (1.09) | -3.97 (1.04) |
| hs_co_c_Log2 | -2.37 (0.61) | -2.32 (0.64) | -2.34 (0.63) |
| hs_cs_c_Log2 | 0.44 (0.58) | 0.44 (0.57) | 0.44 (0.57) |
| hs_cu_c_Log2 | 9.81 (0.25) | 9.84 (0.22) | 9.83 (0.23) |
| hs_hg_c_Log2 | -0.24 (1.59) | -0.35 (1.75) | -0.30 (1.68) |
| hs_mo_c_Log2 | -0.32 (0.83) | -0.31 (0.96) | -0.32 (0.90) |
| hs_dde_cadj_Log2 | 4.63 (1.48) | 4.70 (1.50) | 4.67 (1.49) |
| hs_pcb153_cadj_Log2 | 3.47 (0.86) | 3.63 (0.94) | 3.56 (0.90) |
| hs_pcb170_cadj_Log2 | -0.60 (3.22) | -0.05 (2.77) | -0.31 (3.00) |
| hs_dep_cadj_Log2 | 0.27 (3.16) | 0.06 (3.25) | 0.16 (3.21) |
| hs_pbde153_cadj_Log2 | -4.66 (3.86) | -4.40 (3.80) | -4.53 (3.83) |
| hs_pfhxs_c_Log2 | -1.62 (1.30) | -1.53 (1.31) | -1.57 (1.31) |
| hs_pfoa_c_Log2 | 0.60 (0.55) | 0.62 (0.56) | 0.61 (0.55) |
| hs_pfos_c_Log2 | 0.95 (1.15) | 0.99 (1.08) | 0.97 (1.11) |
| hs_prpa_cadj_Log2 | -1.26 (3.96) | -1.91 (3.68) | -1.61 (3.82) |
| hs_mbzp_cadj_Log2 | 2.42 (1.23) | 2.47 (1.22) | 2.44 (1.22) |
| hs_mibp_cadj_Log2 | 5.54 (1.09) | 5.39 (1.12) | 5.46 (1.11) |
| hs_mnbp_cadj_Log2 | 4.77 (1.08) | 4.60 (0.96) | 4.68 (1.02) |
combined_data$h_cohort <- as.factor(combined_data$h_cohort)
# Create the table
table1(
~ hs_child_age_None + e3_sex_None + e3_yearbir_None + h_edumc_None +
hs_zbmi_who + h_bfdur_Ter + hs_bakery_prod_Ter +
hs_break_cer_Ter + hs_dairy_Ter + hs_fastfood_Ter +
hs_org_food_Ter + hs_proc_meat_Ter + hs_total_fish_Ter + hs_total_fruits_Ter +
hs_total_lipids_Ter +
hs_total_sweets_Ter + hs_total_veg_Ter +
hs_cd_c_Log2 + hs_co_c_Log2 + hs_cs_c_Log2 + hs_cu_c_Log2 +
hs_hg_c_Log2 + hs_mo_c_Log2 + hs_dde_cadj_Log2 + hs_pcb153_cadj_Log2 +
hs_pcb170_cadj_Log2 + hs_dep_cadj_Log2 + hs_pbde153_cadj_Log2 +
hs_pfhxs_c_Log2 + hs_pfoa_c_Log2 + hs_pfos_c_Log2 + hs_prpa_cadj_Log2 +
hs_mbzp_cadj_Log2 + hs_mibp_cadj_Log2 + hs_mnbp_cadj_Log2 | h_cohort,
data = combined_data,
render.continuous = render_cont,
render.categorical = render_cat,
overall = TRUE,
topclass = "Rtable1-shade"
)
| 1 (N=202) |
2 (N=198) |
3 (N=224) |
4 (N=207) |
5 (N=272) |
6 (N=198) |
TRUE (N=1301) |
|
|---|---|---|---|---|---|---|---|
| hs_child_age_None | 6.61 (0.28) | 10.82 (0.58) | 8.78 (0.58) | 6.48 (0.47) | 8.46 (0.53) | 6.51 (0.30) | 7.98 (1.61) |
| e3_sex_None | |||||||
| Male | 97 (48.0 %) | 86 (43.4 %) | 102 (45.5 %) | 93 (44.9 %) | 129 (47.4 %) | 101 (51.0 %) | 608 (46.7 %) |
| Female | 105 (52.0 %) | 112 (56.6 %) | 122 (54.5 %) | 114 (55.1 %) | 143 (52.6 %) | 97 (49.0 %) | 693 (53.3 %) |
| e3_yearbir_None | |||||||
| 2003 | 0 (0.0 %) | 55 (27.8 %) | 0 (0.0 %) | 0 (0.0 %) | 0 (0.0 %) | 0 (0.0 %) | 55 (4.2 %) |
| 2004 | 0 (0.0 %) | 107 (54.0 %) | 0 (0.0 %) | 0 (0.0 %) | 0 (0.0 %) | 0 (0.0 %) | 107 (8.2 %) |
| 2005 | 0 (0.0 %) | 36 (18.2 %) | 120 (53.6 %) | 0 (0.0 %) | 85 (31.2 %) | 0 (0.0 %) | 241 (18.5 %) |
| 2006 | 0 (0.0 %) | 0 (0.0 %) | 99 (44.2 %) | 0 (0.0 %) | 157 (57.7 %) | 0 (0.0 %) | 256 (19.7 %) |
| 2007 | 82 (40.6 %) | 0 (0.0 %) | 5 (2.2 %) | 62 (30.0 %) | 30 (11.0 %) | 71 (35.9 %) | 250 (19.2 %) |
| 2008 | 117 (57.9 %) | 0 (0.0 %) | 0 (0.0 %) | 136 (65.7 %) | 0 (0.0 %) | 126 (63.6 %) | 379 (29.1 %) |
| 2009 | 3 (1.5 %) | 0 (0.0 %) | 0 (0.0 %) | 9 (4.3 %) | 0 (0.0 %) | 1 (0.5 %) | 13 (1.0 %) |
| h_edumc_None | |||||||
| 1 | 90 (44.6 %) | 14 (7.1 %) | 56 (25.0 %) | 9 (4.3 %) | 0 (0.0 %) | 9 (4.5 %) | 178 (13.7 %) |
| 2 | 42 (20.8 %) | 72 (36.4 %) | 91 (40.6 %) | 70 (33.8 %) | 60 (22.1 %) | 114 (57.6 %) | 449 (34.5 %) |
| 3 | 70 (34.7 %) | 112 (56.6 %) | 77 (34.4 %) | 128 (61.8 %) | 212 (77.9 %) | 75 (37.9 %) | 674 (51.8 %) |
| hs_zbmi_who | 0.20 (1.15) | 0.19 (1.13) | 0.80 (1.22) | 0.52 (1.22) | 0.09 (0.90) | 0.68 (1.37) | 0.40 (1.19) |
| h_bfdur_Ter | |||||||
| (0,10.8] | 74 (36.6 %) | 119 (60.1 %) | 70 (31.2 %) | 58 (28.0 %) | 101 (37.1 %) | 84 (42.4 %) | 506 (38.9 %) |
| (10.8,34.9] | 2 (1.0 %) | 57 (28.8 %) | 100 (44.6 %) | 30 (14.5 %) | 0 (0.0 %) | 81 (40.9 %) | 270 (20.8 %) |
| (34.9,Inf] | 126 (62.4 %) | 22 (11.1 %) | 54 (24.1 %) | 119 (57.5 %) | 171 (62.9 %) | 33 (16.7 %) | 525 (40.4 %) |
| hs_bakery_prod_Ter | |||||||
| (0,2] | 29 (14.4 %) | 41 (20.7 %) | 39 (17.4 %) | 34 (16.4 %) | 187 (68.8 %) | 15 (7.6 %) | 345 (26.5 %) |
| (2,6] | 66 (32.7 %) | 51 (25.8 %) | 89 (39.7 %) | 84 (40.6 %) | 74 (27.2 %) | 59 (29.8 %) | 423 (32.5 %) |
| (6,Inf] | 107 (53.0 %) | 106 (53.5 %) | 96 (42.9 %) | 89 (43.0 %) | 11 (4.0 %) | 124 (62.6 %) | 533 (41.0 %) |
| hs_break_cer_Ter | |||||||
| (0,1.1] | 18 (8.9 %) | 65 (32.8 %) | 61 (27.2 %) | 38 (18.4 %) | 57 (21.0 %) | 52 (26.3 %) | 291 (22.4 %) |
| (1.1,5.5] | 55 (27.2 %) | 67 (33.8 %) | 89 (39.7 %) | 101 (48.8 %) | 114 (41.9 %) | 95 (48.0 %) | 521 (40.0 %) |
| (5.5,Inf] | 129 (63.9 %) | 66 (33.3 %) | 74 (33.0 %) | 68 (32.9 %) | 101 (37.1 %) | 51 (25.8 %) | 489 (37.6 %) |
| hs_dairy_Ter | |||||||
| (0,14.6] | 21 (10.4 %) | 41 (20.7 %) | 55 (24.6 %) | 128 (61.8 %) | 76 (27.9 %) | 38 (19.2 %) | 359 (27.6 %) |
| (14.6,25.6] | 86 (42.6 %) | 49 (24.7 %) | 99 (44.2 %) | 51 (24.6 %) | 91 (33.5 %) | 89 (44.9 %) | 465 (35.7 %) |
| (25.6,Inf] | 95 (47.0 %) | 108 (54.5 %) | 70 (31.2 %) | 28 (13.5 %) | 105 (38.6 %) | 71 (35.9 %) | 477 (36.7 %) |
| hs_fastfood_Ter | |||||||
| (0,0.132] | 18 (8.9 %) | 23 (11.6 %) | 18 (8.0 %) | 51 (24.6 %) | 24 (8.8 %) | 9 (4.5 %) | 143 (11.0 %) |
| (0.132,0.5] | 40 (19.8 %) | 101 (51.0 %) | 127 (56.7 %) | 106 (51.2 %) | 169 (62.1 %) | 60 (30.3 %) | 603 (46.3 %) |
| (0.5,Inf] | 144 (71.3 %) | 74 (37.4 %) | 79 (35.3 %) | 50 (24.2 %) | 79 (29.0 %) | 129 (65.2 %) | 555 (42.7 %) |
| hs_org_food_Ter | |||||||
| (0,0.132] | 114 (56.4 %) | 51 (25.8 %) | 118 (52.7 %) | 19 (9.2 %) | 9 (3.3 %) | 118 (59.6 %) | 429 (33.0 %) |
| (0.132,1] | 40 (19.8 %) | 73 (36.9 %) | 70 (31.2 %) | 75 (36.2 %) | 109 (40.1 %) | 29 (14.6 %) | 396 (30.4 %) |
| (1,Inf] | 48 (23.8 %) | 74 (37.4 %) | 36 (16.1 %) | 113 (54.6 %) | 154 (56.6 %) | 51 (25.8 %) | 476 (36.6 %) |
| hs_proc_meat_Ter | |||||||
| (0,1.5] | 118 (58.4 %) | 47 (23.7 %) | 25 (11.2 %) | 83 (40.1 %) | 39 (14.3 %) | 54 (27.3 %) | 366 (28.1 %) |
| (1.5,4] | 32 (15.8 %) | 90 (45.5 %) | 85 (37.9 %) | 71 (34.3 %) | 85 (31.2 %) | 108 (54.5 %) | 471 (36.2 %) |
| (4,Inf] | 52 (25.7 %) | 61 (30.8 %) | 114 (50.9 %) | 53 (25.6 %) | 148 (54.4 %) | 36 (18.2 %) | 464 (35.7 %) |
| hs_total_fish_Ter | |||||||
| (0,1.5] | 82 (40.6 %) | 38 (19.2 %) | 25 (11.2 %) | 130 (62.8 %) | 38 (14.0 %) | 76 (38.4 %) | 389 (29.9 %) |
| (1.5,3] | 53 (26.2 %) | 103 (52.0 %) | 47 (21.0 %) | 57 (27.5 %) | 94 (34.6 %) | 100 (50.5 %) | 454 (34.9 %) |
| (3,Inf] | 67 (33.2 %) | 57 (28.8 %) | 152 (67.9 %) | 20 (9.7 %) | 140 (51.5 %) | 22 (11.1 %) | 458 (35.2 %) |
| hs_total_fruits_Ter | |||||||
| (0,7] | 26 (12.9 %) | 107 (54.0 %) | 83 (37.1 %) | 99 (47.8 %) | 35 (12.9 %) | 63 (31.8 %) | 413 (31.7 %) |
| (7,14.1] | 42 (20.8 %) | 45 (22.7 %) | 85 (37.9 %) | 64 (30.9 %) | 82 (30.1 %) | 89 (44.9 %) | 407 (31.3 %) |
| (14.1,Inf] | 134 (66.3 %) | 46 (23.2 %) | 56 (25.0 %) | 44 (21.3 %) | 155 (57.0 %) | 46 (23.2 %) | 481 (37.0 %) |
| hs_total_lipids_Ter | |||||||
| (0,3] | 18 (8.9 %) | 31 (15.7 %) | 151 (67.4 %) | 24 (11.6 %) | 32 (11.8 %) | 141 (71.2 %) | 397 (30.5 %) |
| (3,7] | 72 (35.6 %) | 90 (45.5 %) | 40 (17.9 %) | 74 (35.7 %) | 82 (30.1 %) | 45 (22.7 %) | 403 (31.0 %) |
| (7,Inf] | 112 (55.4 %) | 77 (38.9 %) | 33 (14.7 %) | 109 (52.7 %) | 158 (58.1 %) | 12 (6.1 %) | 501 (38.5 %) |
| hs_total_sweets_Ter | |||||||
| (0,4.1] | 50 (24.8 %) | 39 (19.7 %) | 93 (41.5 %) | 19 (9.2 %) | 89 (32.7 %) | 54 (27.3 %) | 344 (26.4 %) |
| (4.1,8.5] | 77 (38.1 %) | 61 (30.8 %) | 88 (39.3 %) | 58 (28.0 %) | 125 (46.0 %) | 107 (54.0 %) | 516 (39.7 %) |
| (8.5,Inf] | 75 (37.1 %) | 98 (49.5 %) | 43 (19.2 %) | 130 (62.8 %) | 58 (21.3 %) | 37 (18.7 %) | 441 (33.9 %) |
| hs_total_veg_Ter | |||||||
| (0,6] | 65 (32.2 %) | 53 (26.8 %) | 94 (42.0 %) | 81 (39.1 %) | 42 (15.4 %) | 69 (34.8 %) | 404 (31.1 %) |
| (6,8.5] | 41 (20.3 %) | 42 (21.2 %) | 69 (30.8 %) | 53 (25.6 %) | 57 (21.0 %) | 52 (26.3 %) | 314 (24.1 %) |
| (8.5,Inf] | 96 (47.5 %) | 103 (52.0 %) | 61 (27.2 %) | 73 (35.3 %) | 173 (63.6 %) | 77 (38.9 %) | 583 (44.8 %) |
| hs_cd_c_Log2 | -3.87 (0.84) | -4.06 (1.22) | -4.22 (1.23) | -4.16 (1.11) | -3.60 (0.74) | -3.99 (0.91) | -3.97 (1.04) |
| hs_co_c_Log2 | -2.31 (0.52) | -2.38 (0.56) | -2.46 (0.64) | -2.37 (0.64) | -2.53 (0.64) | -1.93 (0.56) | -2.34 (0.63) |
| hs_cs_c_Log2 | 0.12 (0.45) | 1.01 (0.47) | 0.61 (0.45) | -0.17 (0.39) | 0.71 (0.40) | 0.29 (0.39) | 0.44 (0.57) |
| hs_cu_c_Log2 | 9.86 (0.23) | 9.88 (0.25) | 9.83 (0.20) | 9.80 (0.21) | 9.71 (0.21) | 9.93 (0.21) | 9.83 (0.23) |
| hs_hg_c_Log2 | -0.56 (1.59) | 0.67 (1.29) | 0.92 (1.30) | -1.97 (1.49) | -0.34 (1.06) | -0.57 (1.69) | -0.30 (1.68) |
| hs_mo_c_Log2 | -0.13 (0.79) | -0.58 (1.18) | -0.55 (0.77) | -0.42 (0.84) | -0.17 (0.74) | -0.07 (0.95) | -0.32 (0.90) |
| hs_dde_cadj_Log2 | 3.81 (1.31) | 4.01 (1.28) | 4.36 (1.24) | 5.67 (1.29) | 4.26 (0.94) | 6.06 (1.41) | 4.67 (1.49) |
| hs_pcb153_cadj_Log2 | 2.73 (0.63) | 3.50 (0.76) | 3.66 (0.84) | 3.93 (0.85) | 4.22 (0.69) | 3.03 (0.68) | 3.56 (0.90) |
| hs_pcb170_cadj_Log2 | -2.44 (3.33) | 0.33 (1.89) | 0.41 (2.42) | -0.81 (3.58) | 1.38 (1.63) | -1.38 (3.14) | -0.31 (3.00) |
| hs_dep_cadj_Log2 | 1.44 (3.30) | -0.27 (3.31) | -0.15 (3.07) | -1.42 (3.25) | 0.62 (2.85) | 0.66 (2.82) | 0.16 (3.21) |
| hs_pbde153_cadj_Log2 | -3.39 (3.79) | -5.11 (3.61) | -5.05 (3.83) | -4.86 (3.78) | -2.66 (3.00) | -6.71 (3.67) | -4.53 (3.83) |
| hs_pfhxs_c_Log2 | -1.48 (1.03) | -0.51 (0.83) | -1.55 (0.88) | -2.69 (1.19) | -0.66 (0.76) | -2.83 (1.08) | -1.57 (1.31) |
| hs_pfoa_c_Log2 | 0.86 (0.50) | 0.56 (0.53) | 0.52 (0.51) | 0.42 (0.61) | 0.80 (0.43) | 0.46 (0.61) | 0.61 (0.55) |
| hs_pfos_c_Log2 | 0.57 (0.90) | 1.64 (0.78) | 0.43 (0.97) | 0.19 (1.29) | 1.67 (0.75) | 1.16 (0.88) | 0.97 (1.11) |
| hs_prpa_cadj_Log2 | -0.05 (3.69) | -2.65 (3.49) | 0.69 (3.83) | -2.00 (3.98) | -3.14 (2.92) | -2.22 (3.50) | -1.61 (3.82) |
| hs_mbzp_cadj_Log2 | 1.60 (1.16) | 2.81 (1.19) | 2.52 (1.09) | 2.81 (1.11) | 2.17 (1.11) | 2.85 (1.23) | 2.44 (1.22) |
| hs_mibp_cadj_Log2 | 6.07 (1.02) | 5.47 (1.07) | 4.88 (0.90) | 6.27 (0.87) | 4.74 (0.99) | 5.63 (0.83) | 5.46 (1.11) |
| hs_mnbp_cadj_Log2 | 4.74 (0.90) | 4.24 (0.86) | 3.99 (0.79) | 5.47 (0.86) | 4.79 (0.89) | 4.84 (1.12) | 4.68 (1.02) |
combined_data$h_edumc_None <- as.factor(combined_data$h_edumc_None)
table1(
~ hs_child_age_None + e3_sex_None + e3_yearbir_None + hs_zbmi_who +
h_bfdur_Ter + hs_bakery_prod_Ter + hs_break_cer_Ter + hs_dairy_Ter + hs_fastfood_Ter + hs_org_food_Ter +
hs_proc_meat_Ter +
hs_total_fish_Ter + hs_total_fruits_Ter + hs_total_lipids_Ter + hs_total_sweets_Ter +
hs_total_veg_Ter + hs_cd_c_Log2 + hs_co_c_Log2 +
hs_cs_c_Log2 + hs_cu_c_Log2 + hs_hg_c_Log2 + hs_mo_c_Log2 + hs_dde_cadj_Log2 +
hs_pcb153_cadj_Log2 + hs_pcb170_cadj_Log2 + hs_dep_cadj_Log2 +
hs_pbde153_cadj_Log2 + hs_pfhxs_c_Log2 + hs_pfoa_c_Log2 + hs_pfos_c_Log2 +
hs_prpa_cadj_Log2 + hs_mbzp_cadj_Log2 + hs_mibp_cadj_Log2 + hs_mnbp_cadj_Log2 | h_edumc_None,
data = combined_data,
render.continuous = render_cont,
render.categorical = render_cat,
overall = TRUE,
topclass = "Rtable1-shade"
)
| 1 (N=178) |
2 (N=449) |
3 (N=674) |
TRUE (N=1301) |
|
|---|---|---|---|---|
| hs_child_age_None | 7.61 (1.43) | 7.97 (1.68) | 8.07 (1.60) | 7.98 (1.61) |
| e3_sex_None | ||||
| Male | 96 (53.9 %) | 195 (43.4 %) | 317 (47.0 %) | 608 (46.7 %) |
| Female | 82 (46.1 %) | 254 (56.6 %) | 357 (53.0 %) | 693 (53.3 %) |
| e3_yearbir_None | ||||
| 2003 | 5 (2.8 %) | 20 (4.5 %) | 30 (4.5 %) | 55 (4.2 %) |
| 2004 | 6 (3.4 %) | 43 (9.6 %) | 58 (8.6 %) | 107 (8.2 %) |
| 2005 | 33 (18.5 %) | 84 (18.7 %) | 124 (18.4 %) | 241 (18.5 %) |
| 2006 | 25 (14.0 %) | 73 (16.3 %) | 158 (23.4 %) | 256 (19.7 %) |
| 2007 | 42 (23.6 %) | 89 (19.8 %) | 119 (17.7 %) | 250 (19.2 %) |
| 2008 | 65 (36.5 %) | 136 (30.3 %) | 178 (26.4 %) | 379 (29.1 %) |
| 2009 | 2 (1.1 %) | 4 (0.9 %) | 7 (1.0 %) | 13 (1.0 %) |
| hs_zbmi_who | 0.39 (1.27) | 0.57 (1.26) | 0.30 (1.11) | 0.40 (1.19) |
| h_bfdur_Ter | ||||
| (0,10.8] | 69 (38.8 %) | 200 (44.5 %) | 237 (35.2 %) | 506 (38.9 %) |
| (10.8,34.9] | 31 (17.4 %) | 111 (24.7 %) | 128 (19.0 %) | 270 (20.8 %) |
| (34.9,Inf] | 78 (43.8 %) | 138 (30.7 %) | 309 (45.8 %) | 525 (40.4 %) |
| hs_bakery_prod_Ter | ||||
| (0,2] | 28 (15.7 %) | 105 (23.4 %) | 212 (31.5 %) | 345 (26.5 %) |
| (2,6] | 58 (32.6 %) | 151 (33.6 %) | 214 (31.8 %) | 423 (32.5 %) |
| (6,Inf] | 92 (51.7 %) | 193 (43.0 %) | 248 (36.8 %) | 533 (41.0 %) |
| hs_break_cer_Ter | ||||
| (0,1.1] | 31 (17.4 %) | 118 (26.3 %) | 142 (21.1 %) | 291 (22.4 %) |
| (1.1,5.5] | 60 (33.7 %) | 191 (42.5 %) | 270 (40.1 %) | 521 (40.0 %) |
| (5.5,Inf] | 87 (48.9 %) | 140 (31.2 %) | 262 (38.9 %) | 489 (37.6 %) |
| hs_dairy_Ter | ||||
| (0,14.6] | 29 (16.3 %) | 122 (27.2 %) | 208 (30.9 %) | 359 (27.6 %) |
| (14.6,25.6] | 79 (44.4 %) | 163 (36.3 %) | 223 (33.1 %) | 465 (35.7 %) |
| (25.6,Inf] | 70 (39.3 %) | 164 (36.5 %) | 243 (36.1 %) | 477 (36.7 %) |
| hs_fastfood_Ter | ||||
| (0,0.132] | 19 (10.7 %) | 36 (8.0 %) | 88 (13.1 %) | 143 (11.0 %) |
| (0.132,0.5] | 62 (34.8 %) | 204 (45.4 %) | 337 (50.0 %) | 603 (46.3 %) |
| (0.5,Inf] | 97 (54.5 %) | 209 (46.5 %) | 249 (36.9 %) | 555 (42.7 %) |
| hs_org_food_Ter | ||||
| (0,0.132] | 120 (67.4 %) | 179 (39.9 %) | 130 (19.3 %) | 429 (33.0 %) |
| (0.132,1] | 31 (17.4 %) | 131 (29.2 %) | 234 (34.7 %) | 396 (30.4 %) |
| (1,Inf] | 27 (15.2 %) | 139 (31.0 %) | 310 (46.0 %) | 476 (36.6 %) |
| hs_proc_meat_Ter | ||||
| (0,1.5] | 68 (38.2 %) | 121 (26.9 %) | 177 (26.3 %) | 366 (28.1 %) |
| (1.5,4] | 48 (27.0 %) | 184 (41.0 %) | 239 (35.5 %) | 471 (36.2 %) |
| (4,Inf] | 62 (34.8 %) | 144 (32.1 %) | 258 (38.3 %) | 464 (35.7 %) |
| hs_total_fish_Ter | ||||
| (0,1.5] | 55 (30.9 %) | 152 (33.9 %) | 182 (27.0 %) | 389 (29.9 %) |
| (1.5,3] | 52 (29.2 %) | 163 (36.3 %) | 239 (35.5 %) | 454 (34.9 %) |
| (3,Inf] | 71 (39.9 %) | 134 (29.8 %) | 253 (37.5 %) | 458 (35.2 %) |
| hs_total_fruits_Ter | ||||
| (0,7] | 51 (28.7 %) | 169 (37.6 %) | 193 (28.6 %) | 413 (31.7 %) |
| (7,14.1] | 48 (27.0 %) | 135 (30.1 %) | 224 (33.2 %) | 407 (31.3 %) |
| (14.1,Inf] | 79 (44.4 %) | 145 (32.3 %) | 257 (38.1 %) | 481 (37.0 %) |
| hs_total_lipids_Ter | ||||
| (0,3] | 53 (29.8 %) | 156 (34.7 %) | 188 (27.9 %) | 397 (30.5 %) |
| (3,7] | 49 (27.5 %) | 127 (28.3 %) | 227 (33.7 %) | 403 (31.0 %) |
| (7,Inf] | 76 (42.7 %) | 166 (37.0 %) | 259 (38.4 %) | 501 (38.5 %) |
| hs_total_sweets_Ter | ||||
| (0,4.1] | 56 (31.5 %) | 124 (27.6 %) | 164 (24.3 %) | 344 (26.4 %) |
| (4.1,8.5] | 64 (36.0 %) | 183 (40.8 %) | 269 (39.9 %) | 516 (39.7 %) |
| (8.5,Inf] | 58 (32.6 %) | 142 (31.6 %) | 241 (35.8 %) | 441 (33.9 %) |
| hs_total_veg_Ter | ||||
| (0,6] | 79 (44.4 %) | 166 (37.0 %) | 159 (23.6 %) | 404 (31.1 %) |
| (6,8.5] | 42 (23.6 %) | 112 (24.9 %) | 160 (23.7 %) | 314 (24.1 %) |
| (8.5,Inf] | 57 (32.0 %) | 171 (38.1 %) | 355 (52.7 %) | 583 (44.8 %) |
| hs_cd_c_Log2 | -4.01 (1.09) | -3.99 (1.12) | -3.94 (0.97) | -3.97 (1.04) |
| hs_co_c_Log2 | -2.40 (0.56) | -2.28 (0.67) | -2.37 (0.61) | -2.34 (0.63) |
| hs_cs_c_Log2 | 0.30 (0.56) | 0.43 (0.55) | 0.49 (0.59) | 0.44 (0.57) |
| hs_cu_c_Log2 | 9.84 (0.22) | 9.85 (0.25) | 9.81 (0.22) | 9.83 (0.23) |
| hs_hg_c_Log2 | -0.28 (1.82) | -0.35 (1.74) | -0.27 (1.59) | -0.30 (1.68) |
| hs_mo_c_Log2 | -0.37 (0.88) | -0.27 (0.88) | -0.33 (0.92) | -0.32 (0.90) |
| hs_dde_cadj_Log2 | 3.90 (1.26) | 4.71 (1.58) | 4.85 (1.42) | 4.67 (1.49) |
| hs_pcb153_cadj_Log2 | 2.95 (0.77) | 3.32 (0.80) | 3.87 (0.87) | 3.56 (0.90) |
| hs_pcb170_cadj_Log2 | -1.68 (3.44) | -0.88 (3.15) | 0.43 (2.53) | -0.31 (3.00) |
| hs_dep_cadj_Log2 | 0.28 (3.41) | 0.01 (3.21) | 0.23 (3.15) | 0.16 (3.21) |
| hs_pbde153_cadj_Log2 | -4.19 (3.79) | -5.13 (3.80) | -4.21 (3.82) | -4.53 (3.83) |
| hs_pfhxs_c_Log2 | -1.65 (1.02) | -1.85 (1.45) | -1.36 (1.23) | -1.57 (1.31) |
| hs_pfoa_c_Log2 | 0.58 (0.60) | 0.50 (0.57) | 0.69 (0.52) | 0.61 (0.55) |
| hs_pfos_c_Log2 | 0.52 (0.99) | 0.85 (1.15) | 1.17 (1.07) | 0.97 (1.11) |
| hs_prpa_cadj_Log2 | -0.50 (3.66) | -1.58 (4.04) | -1.91 (3.67) | -1.61 (3.82) |
| hs_mbzp_cadj_Log2 | 2.13 (1.22) | 2.59 (1.27) | 2.43 (1.18) | 2.44 (1.22) |
| hs_mibp_cadj_Log2 | 5.71 (1.12) | 5.52 (1.11) | 5.36 (1.10) | 5.46 (1.11) |
| hs_mnbp_cadj_Log2 | 4.53 (0.96) | 4.64 (1.01) | 4.74 (1.03) | 4.68 (1.02) |
outcome_cov <- cbind(covariate_data, outcome_BMI)
outcome_cov <- outcome_cov[, !duplicated(colnames(outcome_cov))]
#the full chemicals list
chemicals_full <- c(
"hs_as_c_Log2",
"hs_cd_c_Log2",
"hs_co_c_Log2",
"hs_cs_c_Log2",
"hs_cu_c_Log2",
"hs_hg_c_Log2",
"hs_mn_c_Log2",
"hs_mo_c_Log2",
"hs_pb_c_Log2",
"hs_tl_cdich_None",
"hs_dde_cadj_Log2",
"hs_ddt_cadj_Log2",
"hs_hcb_cadj_Log2",
"hs_pcb118_cadj_Log2",
"hs_pcb138_cadj_Log2",
"hs_pcb153_cadj_Log2",
"hs_pcb170_cadj_Log2",
"hs_pcb180_cadj_Log2",
"hs_dep_cadj_Log2",
"hs_detp_cadj_Log2",
"hs_dmdtp_cdich_None",
"hs_dmp_cadj_Log2",
"hs_dmtp_cadj_Log2",
"hs_pbde153_cadj_Log2",
"hs_pbde47_cadj_Log2",
"hs_pfhxs_c_Log2",
"hs_pfna_c_Log2",
"hs_pfoa_c_Log2",
"hs_pfos_c_Log2",
"hs_pfunda_c_Log2",
"hs_bpa_cadj_Log2",
"hs_bupa_cadj_Log2",
"hs_etpa_cadj_Log2",
"hs_mepa_cadj_Log2",
"hs_oxbe_cadj_Log2",
"hs_prpa_cadj_Log2",
"hs_trcs_cadj_Log2",
"hs_mbzp_cadj_Log2",
"hs_mecpp_cadj_Log2",
"hs_mehhp_cadj_Log2",
"hs_mehp_cadj_Log2",
"hs_meohp_cadj_Log2",
"hs_mep_cadj_Log2",
"hs_mibp_cadj_Log2",
"hs_mnbp_cadj_Log2",
"hs_ohminp_cadj_Log2",
"hs_oxominp_cadj_Log2",
"hs_cotinine_cdich_None",
"hs_globalexp2_None"
)
#postnatal diet for child
postnatal_diet <- c(
"h_bfdur_Ter",
"hs_bakery_prod_Ter",
"hs_beverages_Ter",
"hs_break_cer_Ter",
"hs_caff_drink_Ter",
"hs_dairy_Ter",
"hs_fastfood_Ter",
"hs_org_food_Ter",
"hs_proc_meat_Ter",
"hs_readymade_Ter",
"hs_total_bread_Ter",
"hs_total_cereal_Ter",
"hs_total_fish_Ter",
"hs_total_fruits_Ter",
"hs_total_lipids_Ter",
"hs_total_meat_Ter",
"hs_total_potatoes_Ter",
"hs_total_sweets_Ter",
"hs_total_veg_Ter",
"hs_total_yog_Ter"
)
chemicals_columns <- c(chemicals_full)
all_chemicals <- exposome %>% dplyr::select(all_of(chemicals_columns))
diet_columns <- c(postnatal_diet)
all_diet <- exposome %>% dplyr::select(all_of(diet_columns))
all_columns <- c(chemicals_full, postnatal_diet)
extracted_exposome <- exposome %>% dplyr::select(all_of(all_columns))
chemicals_outcome_cov <- cbind(outcome_cov, all_chemicals)
diet_outcome_cov <- cbind(outcome_cov, all_diet)
interested_data <- cbind(outcome_cov, extracted_exposome)
head(interested_data)
interested_data_corr <- select_if(interested_data, is.numeric)
cor_matrix <- cor(interested_data_corr, method = "pearson")
cor_matrix <- cor(interested_data_corr, method = "spearman")
custom_color_scale <- list(
c(0, "darkred"),
c(0.5, "white"),
c(1, "darkblue")
)
plot_ly(
z = cor_matrix,
x = colnames(cor_matrix),
y = colnames(cor_matrix),
type = "heatmap",
colorscale = custom_color_scale
) %>%
layout(
title = "Correlation Matrix",
xaxis = list(tickangle = -90),
yaxis = list(side = "left")
)
#LASSO train/test 70-30
set.seed(101)
train_indices <- sample(seq_len(nrow(chemicals_outcome_cov)), size = floor(0.7 * nrow(interested_data)))
test_indices <- setdiff(seq_len(nrow(chemicals_outcome_cov)), train_indices)
x_train <- as.matrix(chemicals_outcome_cov[train_indices, setdiff(names(chemicals_outcome_cov), "hs_zbmi_who")])
y_train <- chemicals_outcome_cov$hs_zbmi_who[train_indices]
x_test <- as.matrix(chemicals_outcome_cov[test_indices, setdiff(names(chemicals_outcome_cov), "hs_zbmi_who")])
y_test <- chemicals_outcome_cov$hs_zbmi_who[test_indices]
x_train_chemicals_only <- as.matrix(chemicals_outcome_cov[train_indices, chemicals_full])
x_test_chemicals_only <- as.matrix(chemicals_outcome_cov[test_indices, chemicals_full])
fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 1, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)
plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")
best_lambda <- fit_without_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 50 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -4.7797230131
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.0238815730
## hs_co_c_Log2 -0.0011670319
## hs_cs_c_Log2 0.0771865955
## hs_cu_c_Log2 0.6071183261
## hs_hg_c_Log2 -0.0075730086
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.0992489424
## hs_pb_c_Log2 -0.0056257448
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0378984008
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.1721262187
## hs_pcb170_cadj_Log2 -0.0557570999
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.0186165147
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.0357794002
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 -0.0019079468
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.1360824261
## hs_pfos_c_Log2 -0.0478302901
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 .
## hs_oxbe_cadj_Log2 0.0008622765
## hs_prpa_cadj_Log2 0.0011728557
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.0373221816
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.0477304169
## hs_mnbp_cadj_Log2 -0.0036235331
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.231997
# RIDGE
fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 0, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)
plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")
best_lambda <- fit_without_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 50 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -4.469806e+00
## hs_as_c_Log2 6.590433e-03
## hs_cd_c_Log2 -4.093355e-02
## hs_co_c_Log2 -5.049922e-02
## hs_cs_c_Log2 1.230373e-01
## hs_cu_c_Log2 6.078479e-01
## hs_hg_c_Log2 -3.225520e-02
## hs_mn_c_Log2 -3.089195e-02
## hs_mo_c_Log2 -1.068154e-01
## hs_pb_c_Log2 -5.295956e-02
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -4.888006e-02
## hs_ddt_cadj_Log2 4.045085e-03
## hs_hcb_cadj_Log2 -1.857150e-02
## hs_pcb118_cadj_Log2 1.400112e-02
## hs_pcb138_cadj_Log2 -3.614513e-02
## hs_pcb153_cadj_Log2 -1.223407e-01
## hs_pcb170_cadj_Log2 -5.267521e-02
## hs_pcb180_cadj_Log2 -1.074695e-02
## hs_dep_cadj_Log2 -2.548881e-02
## hs_detp_cadj_Log2 8.051621e-03
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 -2.097690e-03
## hs_dmtp_cadj_Log2 7.300567e-05
## hs_pbde153_cadj_Log2 -3.315313e-02
## hs_pbde47_cadj_Log2 5.273953e-03
## hs_pfhxs_c_Log2 -2.966308e-02
## hs_pfna_c_Log2 2.336166e-02
## hs_pfoa_c_Log2 -1.519872e-01
## hs_pfos_c_Log2 -6.495855e-02
## hs_pfunda_c_Log2 1.248503e-02
## hs_bpa_cadj_Log2 3.832688e-04
## hs_bupa_cadj_Log2 6.588467e-03
## hs_etpa_cadj_Log2 -6.098679e-03
## hs_mepa_cadj_Log2 -1.638466e-02
## hs_oxbe_cadj_Log2 1.390524e-02
## hs_prpa_cadj_Log2 1.258510e-02
## hs_trcs_cadj_Log2 2.878805e-03
## hs_mbzp_cadj_Log2 5.550048e-02
## hs_mecpp_cadj_Log2 1.627174e-03
## hs_mehhp_cadj_Log2 2.316991e-02
## hs_mehp_cadj_Log2 -1.662304e-02
## hs_meohp_cadj_Log2 1.137436e-02
## hs_mep_cadj_Log2 3.371106e-03
## hs_mibp_cadj_Log2 -5.391219e-02
## hs_mnbp_cadj_Log2 -4.383016e-02
## hs_ohminp_cadj_Log2 -2.886768e-02
## hs_oxominp_cadj_Log2 2.204660e-02
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.188752
# ELASTIC NET
fit_without_covariates_train <- cv.glmnet(x_train_chemicals_only, y_train, alpha = 0.5, family = "gaussian")
fit_without_covariates_test <- predict(fit_without_covariates_train, s = "lambda.min", newx = x_test_chemicals_only)
test_mse_without_covariates <- mean((y_test - fit_without_covariates_test)^2)
plot(fit_without_covariates_train, xvar = "lambda", main = "Coefficients Path (Without Covariates)")
best_lambda <- fit_without_covariates_train$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates_train, s = best_lambda)
## 50 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -4.785950188
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.025843356
## hs_co_c_Log2 -0.005835867
## hs_cs_c_Log2 0.084715330
## hs_cu_c_Log2 0.607379616
## hs_hg_c_Log2 -0.009800093
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.099724922
## hs_pb_c_Log2 -0.010318890
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.039528137
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.169008355
## hs_pcb170_cadj_Log2 -0.055808065
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.019034348
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.035464586
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 -0.006816020
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.135997766
## hs_pfos_c_Log2 -0.047692264
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 .
## hs_oxbe_cadj_Log2 0.002529961
## hs_prpa_cadj_Log2 0.001735800
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.040317847
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.047892677
## hs_mnbp_cadj_Log2 -0.008483913
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
cat("Model without Covariates - Test MSE:", test_mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.228805
# LASSO with train/test
set.seed(101)
train_indices <- sample(seq_len(nrow(diet_outcome_cov)), size = floor(0.7 * nrow(diet_outcome_cov)))
test_indices <- setdiff(seq_len(nrow(diet_outcome_cov)), train_indices)
diet_data <- diet_outcome_cov[, postnatal_diet]
x_diet_train <- model.matrix(~ . + 0, data = diet_data[train_indices, ])
x_diet_test <- model.matrix(~ . + 0, data = diet_data[test_indices, ])
covariates <- diet_outcome_cov[, c("e3_sex_None", "e3_yearbir_None", "h_edumc_None", "h_cohort", "hs_child_age_None")]
x_covariates_train <- model.matrix(~ . + 0, data = covariates[train_indices, ])
x_covariates_test <- model.matrix(~ . + 0, data = covariates[test_indices, ])
x_full_train <- cbind(x_diet_train, x_covariates_train)
x_full_test <- cbind(x_diet_test, x_covariates_test)
x_full_train[is.na(x_full_train)] <- 0
x_full_test[is.na(x_full_test)] <- 0
x_diet_train[is.na(x_diet_train)] <- 0
x_diet_test[is.na(x_diet_test)] <- 0
y_train <- as.numeric(diet_outcome_cov$hs_zbmi_who[train_indices])
y_test <- as.numeric(diet_outcome_cov$hs_zbmi_who[test_indices])
# fit models
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 1, family = "gaussian")
fit_without_covariates
##
## Call: cv.glmnet(x = x_diet_train, y = y_train, alpha = 1, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 0.06922 9 1.431 0.06022 5
## 1se 0.14570 1 1.442 0.06160 0
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 0.53256344
## h_bfdur_Ter(0,10.8] .
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] .
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] .
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] .
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] .
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] -0.13588632
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] .
## hs_total_fruits_Ter(14.1,Inf] -0.02481964
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.05164312
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] .
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.01594403
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.08180563
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.34942
# RIDGE
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 0, family = "gaussian")
fit_without_covariates
##
## Call: cv.glmnet(x = x_diet_train, y = y_train, alpha = 0, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 3.53 41 1.431 0.08497 40
## 1se 145.70 1 1.441 0.08233 40
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 0.5163069457
## h_bfdur_Ter(0,10.8] -0.0114164662
## h_bfdur_Ter(10.8,34.9] 0.0353770607
## h_bfdur_Ter(34.9,Inf] -0.0138651651
## hs_bakery_prod_Ter(2,6] 0.0228606785
## hs_bakery_prod_Ter(6,Inf] -0.0268639952
## hs_beverages_Ter(0.132,1] -0.0065939314
## hs_beverages_Ter(1,Inf] -0.0016124215
## hs_break_cer_Ter(1.1,5.5] -0.0034207548
## hs_break_cer_Ter(5.5,Inf] -0.0337182186
## hs_caff_drink_Ter(0.132,Inf] -0.0143879393
## hs_dairy_Ter(14.6,25.6] 0.0355023507
## hs_dairy_Ter(25.6,Inf] -0.0005581647
## hs_fastfood_Ter(0.132,0.5] 0.0161761119
## hs_fastfood_Ter(0.5,Inf] -0.0001750742
## hs_org_food_Ter(0.132,1] 0.0151677373
## hs_org_food_Ter(1,Inf] -0.0682466785
## hs_proc_meat_Ter(1.5,4] 0.0222199344
## hs_proc_meat_Ter(4,Inf] -0.0187135643
## hs_readymade_Ter(0.132,0.5] -0.0013536008
## hs_readymade_Ter(0.5,Inf] 0.0105115509
## hs_total_bread_Ter(7,17.5] -0.0035702530
## hs_total_bread_Ter(17.5,Inf] -0.0070550360
## hs_total_cereal_Ter(14.1,23.6] 0.0082269928
## hs_total_cereal_Ter(23.6,Inf] -0.0131001584
## hs_total_fish_Ter(1.5,3] -0.0346609367
## hs_total_fish_Ter(3,Inf] -0.0051749487
## hs_total_fruits_Ter(7,14.1] 0.0266413533
## hs_total_fruits_Ter(14.1,Inf] -0.0389551124
## hs_total_lipids_Ter(3,7] -0.0022752284
## hs_total_lipids_Ter(7,Inf] -0.0476627593
## hs_total_meat_Ter(6,9] 0.0007524275
## hs_total_meat_Ter(9,Inf] 0.0005196923
## hs_total_potatoes_Ter(3,4] 0.0105526823
## hs_total_potatoes_Ter(4,Inf] 0.0048180175
## hs_total_sweets_Ter(4.1,8.5] -0.0392140671
## hs_total_sweets_Ter(8.5,Inf] -0.0010028529
## hs_total_veg_Ter(6,8.5] 0.0009962184
## hs_total_veg_Ter(8.5,Inf] -0.0556956882
## hs_total_yog_Ter(6,8.5] -0.0102351610
## hs_total_yog_Ter(8.5,Inf] -0.0089303177
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.326308
#ELASTIC NET
fit_without_covariates <- cv.glmnet(x_diet_train, y_train, alpha = 0.5, family = "gaussian")
fit_without_covariates
##
## Call: cv.glmnet(x = x_diet_train, y = y_train, alpha = 0.5, family = "gaussian")
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 0.07218 16 1.430 0.05641 12
## 1se 0.29139 1 1.444 0.05877 0
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 0.650606526
## h_bfdur_Ter(0,10.8] .
## h_bfdur_Ter(10.8,34.9] 0.039832328
## h_bfdur_Ter(34.9,Inf] .
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.052635590
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] -0.054788470
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.053455833
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] .
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] -0.185235916
## hs_proc_meat_Ter(1.5,4] 0.008558872
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] -0.057540803
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.017171763
## hs_total_fruits_Ter(14.1,Inf] -0.054914989
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.094342286
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] .
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.089860153
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.118161721
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_diet_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.335144
set.seed(101)
train_indices <- sample(seq_len(nrow(interested_data)), size = floor(0.7 * nrow(interested_data)))
test_indices <- setdiff(seq_len(nrow(interested_data)), train_indices)
diet_data <- interested_data[, postnatal_diet]
x_diet_train <- model.matrix(~ . + 0, data = diet_data[train_indices, ])
x_diet_test <- model.matrix(~ . + 0, data = diet_data[test_indices, ])
chemical_data <- interested_data[, chemicals_full]
x_chemical_train <- as.matrix(chemical_data[train_indices, ])
x_chemical_test <- as.matrix(chemical_data[test_indices, ])
covariates <- interested_data[, c("e3_sex_None", "e3_yearbir_None", "h_edumc_None", "h_cohort", "hs_child_age_None")]
x_covariates_train <- model.matrix(~ . + 0, data = covariates[train_indices, ])
x_covariates_test <- model.matrix(~ . + 0, data = covariates[test_indices, ])
# combine diet and chemical data with and without covariates
x_combined_train <- cbind(x_diet_train, x_chemical_train)
x_combined_test <- cbind(x_diet_test, x_chemical_test)
x_full_train <- cbind(x_combined_train, x_covariates_train)
x_full_test <- cbind(x_combined_test, x_covariates_test)
# make sure no missing values
x_full_train[is.na(x_full_train)] <- 0
x_full_test[is.na(x_full_test)] <- 0
x_combined_train[is.na(x_combined_train)] <- 0
x_combined_test[is.na(x_combined_test)] <- 0
y_train <- as.numeric(interested_data$hs_zbmi_who[train_indices])
y_test <- as.numeric(interested_data$hs_zbmi_who[test_indices])
# LASSO
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 1, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 90 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.016149911
## h_bfdur_Ter(0,10.8] -0.129594522
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] .
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.217291423
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.009808165
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 0.070972556
## hs_fastfood_Ter(0.5,Inf] .
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] .
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] 0.011160944
## hs_total_bread_Ter(7,17.5] -0.010168208
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] -0.024288530
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] .
## hs_total_fruits_Ter(14.1,Inf] -0.016129393
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.047350302
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] 0.018317955
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.006515994
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.041036632
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.022337287
## hs_co_c_Log2 -0.003616434
## hs_cs_c_Log2 0.070483114
## hs_cu_c_Log2 0.656568320
## hs_hg_c_Log2 -0.012267249
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.097496432
## hs_pb_c_Log2 .
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.029771276
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.226942147
## hs_pcb170_cadj_Log2 -0.054403335
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.017878387
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.035568595
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 .
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.125219198
## hs_pfos_c_Log2 -0.047655946
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 .
## hs_oxbe_cadj_Log2 .
## hs_prpa_cadj_Log2 .
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.043689764
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.040902710
## hs_mnbp_cadj_Log2 -0.007173325
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.200253
# RIDGE
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 0, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 90 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -3.7486876482
## h_bfdur_Ter(0,10.8] -0.0862270481
## h_bfdur_Ter(10.8,34.9] 0.0187498222
## h_bfdur_Ter(34.9,Inf] 0.0718972907
## hs_bakery_prod_Ter(2,6] -0.0033853186
## hs_bakery_prod_Ter(6,Inf] -0.1580980396
## hs_beverages_Ter(0.132,1] 0.0052318976
## hs_beverages_Ter(1,Inf] -0.0339118523
## hs_break_cer_Ter(1.1,5.5] 0.0042988311
## hs_break_cer_Ter(5.5,Inf] -0.0503391950
## hs_caff_drink_Ter(0.132,Inf] 0.0156001183
## hs_dairy_Ter(14.6,25.6] 0.0416574408
## hs_dairy_Ter(25.6,Inf] -0.0174860568
## hs_fastfood_Ter(0.132,0.5] 0.0650667870
## hs_fastfood_Ter(0.5,Inf] -0.0300919849
## hs_org_food_Ter(0.132,1] 0.0284491409
## hs_org_food_Ter(1,Inf] -0.0490021669
## hs_proc_meat_Ter(1.5,4] 0.0055207383
## hs_proc_meat_Ter(4,Inf] -0.0063080789
## hs_readymade_Ter(0.132,0.5] 0.0307292842
## hs_readymade_Ter(0.5,Inf] 0.0632539981
## hs_total_bread_Ter(7,17.5] -0.0544944827
## hs_total_bread_Ter(17.5,Inf] 0.0146129335
## hs_total_cereal_Ter(14.1,23.6] -0.0004875292
## hs_total_cereal_Ter(23.6,Inf] 0.0180167268
## hs_total_fish_Ter(1.5,3] -0.0683250014
## hs_total_fish_Ter(3,Inf] 0.0112125503
## hs_total_fruits_Ter(7,14.1] 0.0353241028
## hs_total_fruits_Ter(14.1,Inf] -0.0433100932
## hs_total_lipids_Ter(3,7] -0.0171427895
## hs_total_lipids_Ter(7,Inf] -0.0848619938
## hs_total_meat_Ter(6,9] 0.0172861408
## hs_total_meat_Ter(9,Inf] 0.0044053472
## hs_total_potatoes_Ter(3,4] 0.0536415284
## hs_total_potatoes_Ter(4,Inf] -0.0115575388
## hs_total_sweets_Ter(4.1,8.5] -0.0692484887
## hs_total_sweets_Ter(8.5,Inf] -0.0097071229
## hs_total_veg_Ter(6,8.5] 0.0031586461
## hs_total_veg_Ter(8.5,Inf] -0.0567605211
## hs_total_yog_Ter(6,8.5] -0.0245534422
## hs_total_yog_Ter(8.5,Inf] -0.0386998840
## hs_as_c_Log2 0.0050439215
## hs_cd_c_Log2 -0.0352737869
## hs_co_c_Log2 -0.0396473666
## hs_cs_c_Log2 0.0905666600
## hs_cu_c_Log2 0.5291861050
## hs_hg_c_Log2 -0.0253437065
## hs_mn_c_Log2 -0.0187832842
## hs_mo_c_Log2 -0.0835328881
## hs_pb_c_Log2 -0.0275390915
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0366806354
## hs_ddt_cadj_Log2 0.0032185740
## hs_hcb_cadj_Log2 -0.0317509983
## hs_pcb118_cadj_Log2 0.0025521400
## hs_pcb138_cadj_Log2 -0.0518399321
## hs_pcb153_cadj_Log2 -0.1215197442
## hs_pcb170_cadj_Log2 -0.0418593821
## hs_pcb180_cadj_Log2 -0.0225049584
## hs_dep_cadj_Log2 -0.0189572104
## hs_detp_cadj_Log2 0.0059280868
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 -0.0024527279
## hs_dmtp_cadj_Log2 0.0008420662
## hs_pbde153_cadj_Log2 -0.0277474044
## hs_pbde47_cadj_Log2 0.0052481134
## hs_pfhxs_c_Log2 -0.0305593699
## hs_pfna_c_Log2 -0.0041077407
## hs_pfoa_c_Log2 -0.1108211867
## hs_pfos_c_Log2 -0.0475012252
## hs_pfunda_c_Log2 0.0072385180
## hs_bpa_cadj_Log2 -0.0063616978
## hs_bupa_cadj_Log2 0.0036910227
## hs_etpa_cadj_Log2 -0.0049963326
## hs_mepa_cadj_Log2 -0.0096168009
## hs_oxbe_cadj_Log2 0.0101184198
## hs_prpa_cadj_Log2 0.0061492375
## hs_trcs_cadj_Log2 0.0062007460
## hs_mbzp_cadj_Log2 0.0427566149
## hs_mecpp_cadj_Log2 0.0064702943
## hs_mehhp_cadj_Log2 0.0137458333
## hs_mehp_cadj_Log2 -0.0049569930
## hs_meohp_cadj_Log2 0.0093327409
## hs_mep_cadj_Log2 0.0064280760
## hs_mibp_cadj_Log2 -0.0385576131
## hs_mnbp_cadj_Log2 -0.0344895032
## hs_ohminp_cadj_Log2 -0.0210558232
## hs_oxominp_cadj_Log2 0.0113725933
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.154844
# ELASTIC NET
fit_without_covariates <- cv.glmnet(x_combined_train, y_train, alpha = 0.5, family = "gaussian")
predictions_without_covariates <- predict(fit_without_covariates, s = "lambda.min", newx = x_combined_test)
mse_without_covariates <- mean((y_test - predictions_without_covariates)^2)
plot(fit_without_covariates, xvar = "lambda", main = "Coefficient Path (Without Covariates)")
best_lambda <- fit_without_covariates$lambda.min # lambda that minimizes the MSE
coef(fit_without_covariates, s = best_lambda)
## 90 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.0446731390
## h_bfdur_Ter(0,10.8] -0.1251508851
## h_bfdur_Ter(10.8,34.9] .
## h_bfdur_Ter(34.9,Inf] 0.0089409662
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.2141179309
## hs_beverages_Ter(0.132,1] .
## hs_beverages_Ter(1,Inf] .
## hs_break_cer_Ter(1.1,5.5] .
## hs_break_cer_Ter(5.5,Inf] .
## hs_caff_drink_Ter(0.132,Inf] .
## hs_dairy_Ter(14.6,25.6] 0.0175630268
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 0.0739162130
## hs_fastfood_Ter(0.5,Inf] .
## hs_org_food_Ter(0.132,1] .
## hs_org_food_Ter(1,Inf] -0.0049809964
## hs_proc_meat_Ter(1.5,4] .
## hs_proc_meat_Ter(4,Inf] .
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] 0.0170253867
## hs_total_bread_Ter(7,17.5] -0.0178078486
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_cereal_Ter(14.1,23.6] .
## hs_total_cereal_Ter(23.6,Inf] .
## hs_total_fish_Ter(1.5,3] -0.0311197485
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.0058224545
## hs_total_fruits_Ter(14.1,Inf] -0.0180115810
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.0529611086
## hs_total_meat_Ter(6,9] .
## hs_total_meat_Ter(9,Inf] .
## hs_total_potatoes_Ter(3,4] 0.0233163117
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.0128007469
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.0436127333
## hs_total_yog_Ter(6,8.5] .
## hs_total_yog_Ter(8.5,Inf] .
## hs_as_c_Log2 .
## hs_cd_c_Log2 -0.0242086233
## hs_co_c_Log2 -0.0084586207
## hs_cs_c_Log2 0.0772668783
## hs_cu_c_Log2 0.6571106900
## hs_hg_c_Log2 -0.0143619887
## hs_mn_c_Log2 .
## hs_mo_c_Log2 -0.0974389913
## hs_pb_c_Log2 .
## hs_tl_cdich_None .
## hs_dde_cadj_Log2 -0.0321212412
## hs_ddt_cadj_Log2 .
## hs_hcb_cadj_Log2 .
## hs_pcb118_cadj_Log2 .
## hs_pcb138_cadj_Log2 .
## hs_pcb153_cadj_Log2 -0.2221589832
## hs_pcb170_cadj_Log2 -0.0546331904
## hs_pcb180_cadj_Log2 .
## hs_dep_cadj_Log2 -0.0179999867
## hs_detp_cadj_Log2 .
## hs_dmdtp_cdich_None .
## hs_dmp_cadj_Log2 .
## hs_dmtp_cadj_Log2 .
## hs_pbde153_cadj_Log2 -0.0351341084
## hs_pbde47_cadj_Log2 .
## hs_pfhxs_c_Log2 -0.0055363055
## hs_pfna_c_Log2 .
## hs_pfoa_c_Log2 -0.1254532888
## hs_pfos_c_Log2 -0.0469893259
## hs_pfunda_c_Log2 .
## hs_bpa_cadj_Log2 .
## hs_bupa_cadj_Log2 .
## hs_etpa_cadj_Log2 .
## hs_mepa_cadj_Log2 .
## hs_oxbe_cadj_Log2 .
## hs_prpa_cadj_Log2 0.0001965683
## hs_trcs_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.0457827093
## hs_mecpp_cadj_Log2 .
## hs_mehhp_cadj_Log2 .
## hs_mehp_cadj_Log2 .
## hs_meohp_cadj_Log2 .
## hs_mep_cadj_Log2 .
## hs_mibp_cadj_Log2 -0.0415220843
## hs_mnbp_cadj_Log2 -0.0111086286
## hs_ohminp_cadj_Log2 .
## hs_oxominp_cadj_Log2 .
## hs_cotinine_cdich_None .
## hs_globalexp2_None .
cat("Model without Covariates - Test MSE:", mse_without_covariates, "\n")
## Model without Covariates - Test MSE: 1.198308
Selected data based on the enet features without covariates.
Still trying to decide if to stick with continuous or dichotomous outcome (for sensitivity/specificity). Will try to freeze the covariates in the lasso, ridge, or enet.
#selected chemicals that were noted in enet
chemicals_selected <- c(
"hs_cd_c_Log2",
"hs_co_c_Log2",
"hs_cs_c_Log2",
"hs_cu_c_Log2",
"hs_hg_c_Log2",
"hs_mo_c_Log2",
"hs_pb_c_Log2",
"hs_dde_cadj_Log2",
"hs_pcb153_cadj_Log2",
"hs_pcb170_cadj_Log2",
"hs_dep_cadj_Log2",
"hs_detp_cadj_Log2",
"hs_pbde153_cadj_Log2",
"hs_pfhxs_c_Log2",
"hs_pfoa_c_Log2",
"hs_pfos_c_Log2",
"hs_mepa_cadj_Log2",
"hs_oxbe_cadj_Log2",
"hs_prpa_cadj_Log2",
"hs_mbzp_cadj_Log2",
"hs_mibp_cadj_Log2",
"hs_mnbp_cadj_Log2")
#selected diets that were noted in enet
diet_selected <- c(
"h_bfdur_Ter",
"hs_bakery_prod_Ter",
"hs_break_cer_Ter",
"hs_dairy_Ter",
"hs_fastfood_Ter",
"hs_org_food_Ter",
"hs_proc_meat_Ter",
"hs_total_fish_Ter",
"hs_total_fruits_Ter",
"hs_total_lipids_Ter",
"hs_total_sweets_Ter",
"hs_total_veg_Ter"
)
combined_data_selected <- c(
"h_bfdur_Ter",
"hs_bakery_prod_Ter",
"hs_dairy_Ter",
"hs_fastfood_Ter",
"hs_org_food_Ter",
"hs_readymade_Ter",
"hs_total_bread_Ter",
"hs_total_fish_Ter",
"hs_total_fruits_Ter",
"hs_total_lipids_Ter",
"hs_total_potatoes_Ter",
"hs_total_sweets_Ter",
"hs_total_veg_Ter",
"hs_cd_c_Log2",
"hs_co_c_Log2",
"hs_cs_c_Log2",
"hs_cu_c_Log2",
"hs_hg_c_Log2",
"hs_mo_c_Log2",
"hs_pb_c_Log2",
"hs_dde_cadj_Log2",
"hs_pcb153_cadj_Log2",
"hs_pcb170_cadj_Log2",
"hs_dep_cadj_Log2",
"hs_pbde153_cadj_Log2",
"hs_pfhxs_c_Log2",
"hs_pfoa_c_Log2",
"hs_pfos_c_Log2",
"hs_prpa_cadj_Log2",
"hs_mbzp_cadj_Log2",
"hs_mibp_cadj_Log2",
"hs_mnbp_cadj_Log2"
)
outcome_cov <- cbind(covariate_data, outcome_BMI)
outcome_cov <- outcome_cov[, !duplicated(colnames(outcome_cov))]
finalized_columns <- c(combined_data_selected)
final_selected_data <- exposome %>% dplyr::select(all_of(finalized_columns))
finalized_data <- cbind(outcome_cov, final_selected_data)
head(finalized_data)
numeric_vars <- finalized_data %>%
dplyr::select(where(is.numeric))
cor_matrix <- cor(numeric_vars, use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black", tl.srt = 60, tl.cex = 0.8)
set.seed(101)
# Splitting data into training and test sets
train_indices <- sample(seq_len(nrow(finalized_data)), size = floor(0.7 * nrow(finalized_data)))
test_indices <- setdiff(seq_len(nrow(finalized_data)), train_indices)
# Creating training and test datasets
train_data <- finalized_data[train_indices, ]
test_data <- finalized_data[test_indices, ]
# Separating predictors and outcome variable
x_train <- model.matrix(~ . + 0, data = train_data[ , !names(train_data) %in% "hs_zbmi_who"])
x_test <- model.matrix(~ . + 0, data = test_data[ , !names(test_data) %in% "hs_zbmi_who"])
y_train <- train_data$hs_zbmi_who
y_test <- test_data$hs_zbmi_who
fit_lasso <- cv.glmnet(x_train, y_train, alpha = 1, family = "gaussian")
plot(fit_lasso, xvar = "lambda", main = "Coefficients Path")
best_lambda <- fit_lasso$lambda.min
coef(fit_lasso, s = best_lambda)
## 62 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -6.056529401
## e3_sex_Nonefemale -0.168363086
## e3_sex_Nonemale .
## e3_yearbir_None2004 -0.108416038
## e3_yearbir_None2005 0.015830408
## e3_yearbir_None2006 .
## e3_yearbir_None2007 .
## e3_yearbir_None2008 .
## e3_yearbir_None2009 .
## h_edumc_None2 .
## h_edumc_None3 0.024936478
## h_cohort2 -0.065001689
## h_cohort3 0.498485647
## h_cohort4 0.448839670
## h_cohort5 .
## h_cohort6 0.274566775
## hs_child_age_None .
## h_bfdur_Ter(10.8,34.9] 0.009853977
## h_bfdur_Ter(34.9,Inf] 0.206688198
## hs_bakery_prod_Ter(2,6] -0.079379910
## hs_bakery_prod_Ter(6,Inf] -0.287940431
## hs_dairy_Ter(14.6,25.6] 0.036305012
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 0.087371043
## hs_fastfood_Ter(0.5,Inf] .
## hs_org_food_Ter(0.132,1] 0.014758429
## hs_org_food_Ter(1,Inf] -0.003334294
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] 0.055235035
## hs_total_bread_Ter(7,17.5] -0.086581886
## hs_total_bread_Ter(17.5,Inf] 0.006856624
## hs_total_fish_Ter(1.5,3] -0.014098467
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.019842672
## hs_total_fruits_Ter(14.1,Inf] -0.002110980
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.029588146
## hs_total_potatoes_Ter(3,4] 0.019966856
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.048986632
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.012007741
## hs_cd_c_Log2 -0.021414266
## hs_co_c_Log2 -0.022114848
## hs_cs_c_Log2 0.213157355
## hs_cu_c_Log2 0.788079558
## hs_hg_c_Log2 -0.029736248
## hs_mo_c_Log2 -0.106918910
## hs_pb_c_Log2 -0.046230061
## hs_dde_cadj_Log2 -0.064720561
## hs_pcb153_cadj_Log2 -0.313021060
## hs_pcb170_cadj_Log2 -0.061190488
## hs_dep_cadj_Log2 -0.018332035
## hs_pbde153_cadj_Log2 -0.030575626
## hs_pfhxs_c_Log2 .
## hs_pfoa_c_Log2 -0.128583627
## hs_pfos_c_Log2 .
## hs_prpa_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.063851782
## hs_mibp_cadj_Log2 -0.044082983
## hs_mnbp_cadj_Log2 -0.017333893
predictions_lasso <- predict(fit_lasso, s = "lambda.min", newx = x_test)
mse_lasso <- mean((y_test - predictions_lasso)^2)
rmse_lasso <- sqrt(mse_lasso)
roc_lasso <- roc(y_test, predictions_lasso)
## Setting levels: control = -3.58, case = -2.22
## Setting direction: controls < cases
auc_lasso <- auc(roc_lasso)
cat("Lasso Test MSE:", mse_lasso, "\n")
## Lasso Test MSE: 1.161457
cat("Lasso Test RMSE:", rmse_lasso, "\n")
## Lasso Test RMSE: 1.077709
cat("Lasso Test AUC:", auc_lasso)
## Lasso Test AUC: 1
plot(roc_lasso, main = "ROC Curve (Lasso)")
fit_ridge <- cv.glmnet(x_train, y_train, alpha = 0, family = "gaussian")
plot(fit_ridge, xvar = "lambda", main = "Coefficients Path")
best_lambda <- fit_ridge$lambda.min
coef(fit_ridge, s = best_lambda)
## 62 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -5.506665822
## e3_sex_Nonefemale -0.087702836
## e3_sex_Nonemale 0.087402624
## e3_yearbir_None2004 -0.116309944
## e3_yearbir_None2005 0.056970622
## e3_yearbir_None2006 -0.018414120
## e3_yearbir_None2007 0.021957431
## e3_yearbir_None2008 0.025846916
## e3_yearbir_None2009 0.030936201
## h_edumc_None2 0.055212238
## h_edumc_None3 0.068414042
## h_cohort2 -0.125274663
## h_cohort3 0.317193746
## h_cohort4 0.292921048
## h_cohort5 -0.025263276
## h_cohort6 0.193660644
## hs_child_age_None -0.010836848
## h_bfdur_Ter(10.8,34.9] 0.045883632
## h_bfdur_Ter(34.9,Inf] 0.173946349
## hs_bakery_prod_Ter(2,6] -0.077881239
## hs_bakery_prod_Ter(6,Inf] -0.239808801
## hs_dairy_Ter(14.6,25.6] 0.067133380
## hs_dairy_Ter(25.6,Inf] 0.006026921
## hs_fastfood_Ter(0.132,0.5] 0.082209045
## hs_fastfood_Ter(0.5,Inf] -0.020476480
## hs_org_food_Ter(0.132,1] 0.017363627
## hs_org_food_Ter(1,Inf] -0.044193460
## hs_readymade_Ter(0.132,0.5] 0.040152728
## hs_readymade_Ter(0.5,Inf] 0.090677701
## hs_total_bread_Ter(7,17.5] -0.091649770
## hs_total_bread_Ter(17.5,Inf] 0.027788762
## hs_total_fish_Ter(1.5,3] -0.061716572
## hs_total_fish_Ter(3,Inf] -0.021700771
## hs_total_fruits_Ter(7,14.1] 0.035358675
## hs_total_fruits_Ter(14.1,Inf] -0.028781089
## hs_total_lipids_Ter(3,7] -0.011001748
## hs_total_lipids_Ter(7,Inf] -0.060711731
## hs_total_potatoes_Ter(3,4] 0.035711239
## hs_total_potatoes_Ter(4,Inf] 0.001734892
## hs_total_sweets_Ter(4.1,8.5] -0.077820901
## hs_total_sweets_Ter(8.5,Inf] -0.007782675
## hs_total_veg_Ter(6,8.5] 0.010704282
## hs_total_veg_Ter(8.5,Inf] -0.033156441
## hs_cd_c_Log2 -0.035319275
## hs_co_c_Log2 -0.044529238
## hs_cs_c_Log2 0.202597863
## hs_cu_c_Log2 0.713939861
## hs_hg_c_Log2 -0.029266725
## hs_mo_c_Log2 -0.105371823
## hs_pb_c_Log2 -0.043118149
## hs_dde_cadj_Log2 -0.069459637
## hs_pcb153_cadj_Log2 -0.246311891
## hs_pcb170_cadj_Log2 -0.059596050
## hs_dep_cadj_Log2 -0.019299041
## hs_pbde153_cadj_Log2 -0.031005364
## hs_pfhxs_c_Log2 -0.002098677
## hs_pfoa_c_Log2 -0.145681432
## hs_pfos_c_Log2 -0.019471004
## hs_prpa_cadj_Log2 0.001734861
## hs_mbzp_cadj_Log2 0.066326947
## hs_mibp_cadj_Log2 -0.041795405
## hs_mnbp_cadj_Log2 -0.036158552
predictions_ridge <- predict(fit_ridge, s = "lambda.min", newx = x_test)
mse_ridge <- mean((y_test - predictions_ridge)^2)
rmse_ridge <- sqrt(mse_ridge)
roc_ridge <- roc(y_test, predictions_ridge)
## Setting levels: control = -3.58, case = -2.22
## Setting direction: controls < cases
auc_ridge <- auc(roc_ridge)
cat("Ridge Test MSE:", mse_ridge, "\n")
## Ridge Test MSE: 1.150747
cat("Ridge Test RMSE:", rmse_ridge, "\n")
## Ridge Test RMSE: 1.072729
cat("Ridge Test AUC:", auc_ridge, "\n")
## Ridge Test AUC: 1
plot(roc_ridge, main = "ROC Curve (Ridge)")
fit_enet <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "gaussian")
plot(fit_enet, xvar = "lambda", main = "Coefficients Path")
best_lambda <- fit_enet$lambda.min
coef(fit_enet, s = best_lambda)
## 62 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -6.135902449
## e3_sex_Nonefemale -0.091328664
## e3_sex_Nonemale 0.078356308
## e3_yearbir_None2004 -0.106629256
## e3_yearbir_None2005 0.020246640
## e3_yearbir_None2006 .
## e3_yearbir_None2007 .
## e3_yearbir_None2008 .
## e3_yearbir_None2009 .
## h_edumc_None2 .
## h_edumc_None3 0.026571071
## h_cohort2 -0.080139872
## h_cohort3 0.478028559
## h_cohort4 0.438891150
## h_cohort5 .
## h_cohort6 0.267851096
## hs_child_age_None .
## h_bfdur_Ter(10.8,34.9] 0.015076770
## h_bfdur_Ter(34.9,Inf] 0.205015743
## hs_bakery_prod_Ter(2,6] -0.080046803
## hs_bakery_prod_Ter(6,Inf] -0.284248988
## hs_dairy_Ter(14.6,25.6] 0.039639471
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] 0.088885113
## hs_fastfood_Ter(0.5,Inf] .
## hs_org_food_Ter(0.132,1] 0.013880043
## hs_org_food_Ter(1,Inf] -0.008246874
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] 0.056877061
## hs_total_bread_Ter(7,17.5] -0.087779515
## hs_total_bread_Ter(17.5,Inf] 0.008235596
## hs_total_fish_Ter(1.5,3] -0.018480536
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] 0.020734197
## hs_total_fruits_Ter(14.1,Inf] -0.005694289
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] -0.032924994
## hs_total_potatoes_Ter(3,4] 0.020685984
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] -0.051802913
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] .
## hs_total_veg_Ter(8.5,Inf] -0.014437172
## hs_cd_c_Log2 -0.023265047
## hs_co_c_Log2 -0.024387720
## hs_cs_c_Log2 0.213269641
## hs_cu_c_Log2 0.787050001
## hs_hg_c_Log2 -0.030166372
## hs_mo_c_Log2 -0.107397502
## hs_pb_c_Log2 -0.045306072
## hs_dde_cadj_Log2 -0.065275097
## hs_pcb153_cadj_Log2 -0.308430760
## hs_pcb170_cadj_Log2 -0.061445158
## hs_dep_cadj_Log2 -0.018623289
## hs_pbde153_cadj_Log2 -0.030728037
## hs_pfhxs_c_Log2 .
## hs_pfoa_c_Log2 -0.131846844
## hs_pfos_c_Log2 .
## hs_prpa_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.065135803
## hs_mibp_cadj_Log2 -0.043589996
## hs_mnbp_cadj_Log2 -0.020027202
predictions_enet <- predict(fit_enet, s = "lambda.min", newx = x_test)
mse_enet <- mean((y_test - predictions_enet)^2)
rmse_enet <- sqrt(mse_enet)
roc_enet <- roc(y_test, predictions_enet)
## Setting levels: control = -3.58, case = -2.22
## Setting direction: controls < cases
auc_enet <- auc(roc_enet)
cat("Elastic Net Test MSE:", mse_enet, "\n")
## Elastic Net Test MSE: 1.161523
cat("Elastic Net Test RMSE:", rmse_enet, "\n")
## Elastic Net Test RMSE: 1.07774
cat("Elastic Net Test AUC:", auc_enet, "\n")
## Elastic Net Test AUC: 1
plot(roc_enet, main = "ROC Curve (Elastic Net)")
rf_model <- randomForest(x_train, y_train, ntree=500, importance=TRUE)
predictions_rf <- predict(rf_model, x_test)
mse_rf <- mean((y_test - predictions_rf)^2)
rmse_rf <- sqrt(mse_rf)
roc_rf <- roc(y_test, predictions_rf)
## Setting levels: control = -3.58, case = -2.22
## Setting direction: controls < cases
auc_rf <- auc(roc_rf)
cat("Random Forest Test MSE:", mse_rf, "\n")
## Random Forest Test MSE: 1.143216
cat("Random Forest Test RMSE:", rmse_rf, "\n")
## Random Forest Test RMSE: 1.069213
cat("Random Forest Test AUC:", auc_rf, "\n")
## Random Forest Test AUC: 1
plot(roc_rf, main = "ROC Curve (Random Forest)")
varImpPlot(rf_model)
gbm_model <- gbm(hs_zbmi_who ~ ., data = train_data,
distribution = "gaussian",
n.trees = 1000,
interaction.depth = 3,
n.minobsinnode = 10,
shrinkage = 0.01,
cv.folds = 5,
verbose = TRUE)
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.4330 nan 0.0100 0.0039
## 2 1.4294 nan 0.0100 0.0031
## 3 1.4260 nan 0.0100 0.0018
## 4 1.4230 nan 0.0100 0.0016
## 5 1.4183 nan 0.0100 0.0036
## 6 1.4144 nan 0.0100 0.0042
## 7 1.4102 nan 0.0100 0.0033
## 8 1.4064 nan 0.0100 0.0032
## 9 1.4018 nan 0.0100 0.0039
## 10 1.3984 nan 0.0100 0.0027
## 20 1.3622 nan 0.0100 0.0020
## 40 1.3002 nan 0.0100 0.0015
## 60 1.2525 nan 0.0100 0.0007
## 80 1.2121 nan 0.0100 0.0013
## 100 1.1770 nan 0.0100 0.0008
## 120 1.1461 nan 0.0100 0.0002
## 140 1.1191 nan 0.0100 -0.0000
## 160 1.0942 nan 0.0100 0.0002
## 180 1.0718 nan 0.0100 0.0007
## 200 1.0511 nan 0.0100 0.0003
## 220 1.0334 nan 0.0100 -0.0001
## 240 1.0163 nan 0.0100 -0.0001
## 260 1.0010 nan 0.0100 -0.0000
## 280 0.9857 nan 0.0100 0.0002
## 300 0.9721 nan 0.0100 -0.0005
## 320 0.9588 nan 0.0100 -0.0002
## 340 0.9458 nan 0.0100 -0.0002
## 360 0.9337 nan 0.0100 -0.0001
## 380 0.9220 nan 0.0100 -0.0001
## 400 0.9109 nan 0.0100 -0.0001
## 420 0.9003 nan 0.0100 -0.0004
## 440 0.8905 nan 0.0100 -0.0003
## 460 0.8810 nan 0.0100 -0.0004
## 480 0.8715 nan 0.0100 -0.0002
## 500 0.8623 nan 0.0100 -0.0000
## 520 0.8534 nan 0.0100 -0.0001
## 540 0.8450 nan 0.0100 -0.0002
## 560 0.8368 nan 0.0100 -0.0002
## 580 0.8290 nan 0.0100 -0.0001
## 600 0.8215 nan 0.0100 -0.0001
## 620 0.8135 nan 0.0100 -0.0001
## 640 0.8065 nan 0.0100 -0.0003
## 660 0.7991 nan 0.0100 -0.0003
## 680 0.7925 nan 0.0100 -0.0003
## 700 0.7856 nan 0.0100 -0.0005
## 720 0.7796 nan 0.0100 -0.0006
## 740 0.7733 nan 0.0100 -0.0006
## 760 0.7664 nan 0.0100 -0.0003
## 780 0.7601 nan 0.0100 -0.0002
## 800 0.7545 nan 0.0100 -0.0003
## 820 0.7487 nan 0.0100 -0.0003
## 840 0.7427 nan 0.0100 -0.0002
## 860 0.7367 nan 0.0100 -0.0004
## 880 0.7308 nan 0.0100 -0.0000
## 900 0.7254 nan 0.0100 -0.0001
## 920 0.7199 nan 0.0100 -0.0001
## 940 0.7144 nan 0.0100 -0.0002
## 960 0.7094 nan 0.0100 -0.0006
## 980 0.7039 nan 0.0100 -0.0002
## 1000 0.6990 nan 0.0100 -0.0003
# finding the best number of trees based on cross-validation
best_trees <- gbm.perf(gbm_model, method = "cv")
predictions_gbm <- predict(gbm_model, test_data, n.trees = best_trees)
mse_gbm <- mean((y_test - predictions_gbm)^2)
rmse_gbm <- sqrt(mse_gbm)
roc_gbm <- roc(y_test, predictions_gbm)
## Setting levels: control = -3.58, case = -2.22
## Setting direction: controls < cases
auc_gbm <- auc(roc_gbm)
cat("GBM Test MSE:", mse_gbm, "\n")
## GBM Test MSE: 1.123562
cat("GBM Test RMSE:", rmse_gbm, "\n")
## GBM Test RMSE: 1.059982
cat("GBM Test AUC:", auc_gbm, "\n")
## GBM Test AUC: 1
plot(roc_gbm, main = "ROC Curve (GBM)")
summary(gbm_model)
profvis({
control <- trainControl(method = "cv", number = 5)
# lasso with cross-validation
fit_lasso_cv <- train(x_train, y_train, method = "glmnet",
trControl = control, tuneGrid = expand.grid(alpha = 1, lambda = fit_lasso$lambda.min))
print(fit_lasso_cv)
# ridge with cross-validation
fit_ridge_cv <- train(x_train, y_train, method = "glmnet",
trControl = control, tuneGrid = expand.grid(alpha = 0, lambda = fit_ridge$lambda.min))
print(fit_ridge_cv)
# enet with cross-validation
fit_enet_cv <- train(x_train, y_train, method = "glmnet",
trControl = control, tuneGrid = expand.grid(alpha = 0.5, lambda = fit_enet$lambda.min))
print(fit_enet_cv)
# random forest with cross-validation
rf_cv <- train(x_train, y_train, method = "rf", trControl = control)
print(rf_cv)
# GBM with cross-validation
gbm_cv <- train(hs_zbmi_who ~ ., data = train_data, method = "gbm", trControl = control, verbose = FALSE)
print(gbm_cv)
})
## glmnet
##
## 910 samples
## 61 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 729, 727, 728, 729, 727
## Resampling results:
##
## RMSE Rsquared MAE
## 1.054526 0.2325496 0.8371573
##
## Tuning parameter 'alpha' was held constant at a value of 1
## Tuning
## parameter 'lambda' was held constant at a value of 0.01696465
## glmnet
##
## 910 samples
## 61 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 728, 728, 728, 728, 728
## Resampling results:
##
## RMSE Rsquared MAE
## 1.069057 0.2090251 0.8468801
##
## Tuning parameter 'alpha' was held constant at a value of 0
## Tuning
## parameter 'lambda' was held constant at a value of 0.2578475
## glmnet
##
## 910 samples
## 61 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 728, 727, 729, 727, 729
## Resampling results:
##
## RMSE Rsquared MAE
## 1.074543 0.2018854 0.8459061
##
## Tuning parameter 'alpha' was held constant at a value of 0.5
## Tuning
## parameter 'lambda' was held constant at a value of 0.03091511
## Random Forest
##
## 910 samples
## 61 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 728, 727, 729, 728, 728
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 1.112479 0.1884249 0.8811975
## 31 1.076084 0.1993703 0.8449897
## 61 1.075272 0.1985940 0.8456755
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 61.
## Stochastic Gradient Boosting
##
## 910 samples
## 37 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 728, 729, 727, 729, 727
## Resampling results across tuning parameters:
##
## interaction.depth n.trees RMSE Rsquared MAE
## 1 50 1.100875 0.1650372 0.8698880
## 1 100 1.082768 0.1886242 0.8554560
## 1 150 1.080526 0.1944005 0.8540844
## 2 50 1.089094 0.1795390 0.8580456
## 2 100 1.091353 0.1826065 0.8627443
## 2 150 1.101216 0.1771400 0.8735445
## 3 50 1.076080 0.2027194 0.8526312
## 3 100 1.086496 0.1927068 0.8616613
## 3 150 1.101047 0.1811737 0.8709619
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 50, interaction.depth =
## 3, shrinkage = 0.1 and n.minobsinnode = 10.
First 10 rows and columns of the metabolomic serum data
load("/Users/allison/Library/CloudStorage/GoogleDrive-aflouie@usc.edu/My Drive/HELIX_data/metabol_serum.RData")
kable(metabol_serum.d[1:10,1:10], align="c", digits=2, format="pipe")
| 430 | 1187 | 940 | 936 | 788 | 698 | 380 | 196 | 114 | 885 | |
|---|---|---|---|---|---|---|---|---|---|---|
| metab_1 | -2.15 | -0.69 | -0.69 | -0.19 | -1.96 | -1.90 | -0.22 | -1.38 | -0.54 | -1.25 |
| metab_2 | -0.71 | -0.37 | -0.36 | -0.34 | -0.35 | -0.63 | -0.26 | -0.46 | -0.44 | -0.48 |
| metab_3 | 8.60 | 9.15 | 8.95 | 8.54 | 8.73 | 8.24 | 9.03 | 8.29 | 8.37 | 8.18 |
| metab_4 | 0.55 | -1.33 | -0.13 | -0.62 | -0.80 | -0.46 | 0.49 | 0.12 | -0.76 | -0.07 |
| metab_5 | 7.05 | 6.89 | 7.10 | 7.01 | 6.90 | 6.94 | 6.77 | 6.62 | 6.85 | 7.24 |
| metab_6 | 5.79 | 5.81 | 5.86 | 5.95 | 5.95 | 5.42 | 5.82 | 5.65 | 5.44 | 5.60 |
| metab_7 | 3.75 | 4.26 | 4.35 | 4.24 | 4.88 | 4.70 | 4.08 | 4.73 | 3.98 | 4.30 |
| metab_8 | 5.07 | 5.08 | 5.92 | 5.41 | 5.39 | 4.62 | 5.10 | 5.28 | 4.51 | 5.45 |
| metab_9 | -1.87 | -2.30 | -1.97 | -1.89 | -1.55 | -1.78 | -2.29 | -1.64 | -2.02 | -1.68 |
| metab_10 | -2.77 | -3.42 | -3.40 | -2.84 | -2.45 | -3.14 | -3.36 | -2.88 | -3.05 | -2.92 |
metabol_serum_transposed <- as.data.frame(t(metabol_serum.d))
metabol_serum_transposed$ID <- as.integer(rownames(metabol_serum_transposed))
# Add the ID column to the first position
metabol_serum_transposed <- metabol_serum_transposed[, c("ID", setdiff(names(metabol_serum_transposed), "ID"))]
# Now, the ID is the first column, and the layout is preserved
kable(head(metabol_serum_transposed), align = "c", digits = 2, format = "pipe")
| ID | metab_1 | metab_2 | metab_3 | metab_4 | metab_5 | metab_6 | metab_7 | metab_8 | metab_9 | metab_10 | metab_11 | metab_12 | metab_13 | metab_14 | metab_15 | metab_16 | metab_17 | metab_18 | metab_19 | metab_20 | metab_21 | metab_22 | metab_23 | metab_24 | metab_25 | metab_26 | metab_27 | metab_28 | metab_29 | metab_30 | metab_31 | metab_32 | metab_33 | metab_34 | metab_35 | metab_36 | metab_37 | metab_38 | metab_39 | metab_40 | metab_41 | metab_42 | metab_43 | metab_44 | metab_45 | metab_46 | metab_47 | metab_48 | metab_49 | metab_50 | metab_51 | metab_52 | metab_53 | metab_54 | metab_55 | metab_56 | metab_57 | metab_58 | metab_59 | metab_60 | metab_61 | metab_62 | metab_63 | metab_64 | metab_65 | metab_66 | metab_67 | metab_68 | metab_69 | metab_70 | metab_71 | metab_72 | metab_73 | metab_74 | metab_75 | metab_76 | metab_77 | metab_78 | metab_79 | metab_80 | metab_81 | metab_82 | metab_83 | metab_84 | metab_85 | metab_86 | metab_87 | metab_88 | metab_89 | metab_90 | metab_91 | metab_92 | metab_93 | metab_94 | metab_95 | metab_96 | metab_97 | metab_98 | metab_99 | metab_100 | metab_101 | metab_102 | metab_103 | metab_104 | metab_105 | metab_106 | metab_107 | metab_108 | metab_109 | metab_110 | metab_111 | metab_112 | metab_113 | metab_114 | metab_115 | metab_116 | metab_117 | metab_118 | metab_119 | metab_120 | metab_121 | metab_122 | metab_123 | metab_124 | metab_125 | metab_126 | metab_127 | metab_128 | metab_129 | metab_130 | metab_131 | metab_132 | metab_133 | metab_134 | metab_135 | metab_136 | metab_137 | metab_138 | metab_139 | metab_140 | metab_141 | metab_142 | metab_143 | metab_144 | metab_145 | metab_146 | metab_147 | metab_148 | metab_149 | metab_150 | metab_151 | metab_152 | metab_153 | metab_154 | metab_155 | metab_156 | metab_157 | metab_158 | metab_159 | metab_160 | metab_161 | metab_162 | metab_163 | metab_164 | metab_165 | metab_166 | metab_167 | metab_168 | metab_169 | metab_170 | metab_171 | metab_172 | metab_173 | metab_174 | metab_175 | metab_176 | metab_177 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 430 | 430 | -2.15 | -0.71 | 8.60 | 0.55 | 7.05 | 5.79 | 3.75 | 5.07 | -1.87 | -2.77 | -3.31 | -2.91 | -2.94 | -1.82 | -4.40 | -4.10 | -5.41 | -5.13 | -5.35 | -3.39 | -5.08 | -6.06 | -6.06 | -4.99 | -4.46 | -4.63 | -3.27 | -4.61 | 2.17 | -1.73 | -4.97 | -4.90 | -2.63 | -5.29 | -2.38 | -4.06 | -5.11 | -5.35 | -4.80 | -3.92 | -3.92 | -5.47 | -4.22 | -2.56 | -3.93 | 5.15 | 6.03 | 10.20 | 5.14 | 7.82 | 12.31 | 7.27 | 7.08 | 1.79 | 7.73 | 7.98 | 1.96 | 6.15 | 0.98 | 0.60 | 4.42 | 4.36 | 5.85 | 1.03 | 2.74 | -2.53 | -2.05 | -2.91 | -1.61 | -1.63 | 5.03 | 0.14 | 6.23 | -2.95 | 1.29 | 1.70 | -2.83 | 4.55 | 4.05 | 2.56 | -0.29 | 8.33 | 9.93 | 4.89 | 1.28 | 2.16 | 5.82 | 8.95 | 7.72 | 8.41 | 4.71 | 0.10 | 2.02 | 0.16 | 5.82 | 7.45 | 6.17 | 6.81 | -0.70 | -1.25 | -0.65 | 2.05 | 3.39 | 4.94 | -0.69 | -1.44 | -2.06 | -2.44 | -1.30 | -0.73 | -1.52 | -2.43 | -3.26 | 1.97 | 0.03 | 1.09 | 3.98 | 4.56 | 4.16 | 0.42 | 3.48 | 4.88 | 3.84 | 4.70 | 4.04 | 1.58 | -0.76 | 1.75 | 2.48 | 4.43 | 4.68 | 3.29 | 0.97 | 1.03 | 0.44 | 1.55 | 2.26 | 2.72 | 0.12 | -0.90 | -0.50 | 0.02 | -0.18 | 1.02 | -2.69 | -1.66 | 0.47 | 0.28 | 6.75 | 7.67 | -2.66 | -1.52 | 7.28 | -0.08 | 2.39 | 1.55 | 3.01 | 2.92 | -0.48 | 6.78 | 3.90 | 4.05 | 3.17 | -1.46 | 3.56 | 4.60 | -3.55 | -2.79 | -1.98 | -1.84 | 3.98 | 6.47 | 7.16 | -0.01 | 6.57 | 6.86 | 8.36 |
| 1187 | 1187 | -0.69 | -0.37 | 9.15 | -1.33 | 6.89 | 5.81 | 4.26 | 5.08 | -2.30 | -3.42 | -3.63 | -3.16 | -3.22 | -1.57 | -4.10 | -5.35 | -5.68 | -6.11 | -5.54 | -3.50 | -5.24 | -5.72 | -5.97 | -4.94 | -4.25 | -4.46 | -3.55 | -4.64 | 1.81 | -2.92 | -4.44 | -4.49 | -3.53 | -4.94 | -3.15 | -4.13 | -4.47 | -4.90 | -4.24 | -3.49 | -3.94 | -4.99 | -4.02 | -2.69 | -3.69 | 5.13 | 5.57 | 9.93 | 6.13 | 8.47 | 12.32 | 6.83 | 5.94 | 1.64 | 6.82 | 7.74 | 1.98 | 6.11 | 0.99 | 0.19 | 4.34 | 4.36 | 5.47 | 0.92 | 2.69 | -2.69 | -1.93 | -2.79 | -1.63 | -1.69 | 4.58 | 0.41 | 6.14 | -3.06 | 1.05 | 2.10 | -2.95 | 4.51 | 4.30 | 2.57 | 0.08 | 8.27 | 9.54 | 4.61 | 1.39 | 1.91 | 5.91 | 8.59 | 7.34 | 8.04 | 4.29 | -0.04 | 2.17 | 0.42 | 5.39 | 6.95 | 5.68 | 6.09 | -0.68 | -1.29 | -0.76 | 1.84 | 3.06 | 4.40 | -0.52 | -1.52 | -1.90 | -2.44 | -1.46 | -1.00 | -1.33 | -2.41 | -3.67 | 2.48 | 0.27 | 1.02 | 4.19 | 4.43 | 4.19 | 0.33 | 3.24 | 4.38 | 3.92 | 5.09 | 4.42 | 1.01 | -0.53 | 1.36 | 2.25 | 4.54 | 5.10 | 3.45 | 0.65 | 0.83 | 0.36 | 1.68 | 2.56 | 2.70 | 0.02 | -1.02 | -0.93 | -0.22 | 0.11 | 1.60 | -2.70 | -1.31 | 1.08 | 0.54 | 6.29 | 7.97 | -3.22 | -1.34 | 7.50 | 0.48 | 2.19 | 1.49 | 3.09 | 2.71 | -0.38 | 6.86 | 3.77 | 4.31 | 3.23 | -1.82 | 3.80 | 5.05 | -3.31 | -2.18 | -2.21 | -2.01 | 4.91 | 6.84 | 7.14 | 0.14 | 6.03 | 6.55 | 7.91 |
| 940 | 940 | -0.69 | -0.36 | 8.95 | -0.13 | 7.10 | 5.86 | 4.35 | 5.92 | -1.97 | -3.40 | -3.41 | -2.99 | -3.01 | -1.65 | -3.55 | -4.82 | -5.41 | -5.84 | -5.13 | -2.83 | -4.86 | -5.51 | -5.51 | -4.63 | -3.73 | -4.00 | -2.92 | -4.21 | 2.79 | -1.41 | -4.80 | -5.47 | -2.10 | -5.47 | -2.14 | -4.18 | -4.84 | -5.24 | -4.64 | -3.20 | -3.90 | -5.24 | -3.77 | -2.70 | -2.76 | 5.21 | 5.86 | 9.78 | 6.38 | 8.29 | 12.49 | 7.01 | 6.49 | 1.97 | 7.17 | 7.62 | 2.40 | 6.93 | 1.85 | 1.45 | 5.11 | 5.30 | 6.27 | 2.35 | 3.31 | -2.50 | -1.41 | -2.61 | -0.93 | -1.03 | 4.54 | 1.59 | 6.03 | -2.74 | 1.79 | 2.68 | -8.16 | 5.19 | 5.14 | 3.16 | 0.24 | 9.09 | 10.25 | 5.44 | 1.90 | 2.46 | 6.66 | 9.19 | 8.24 | 8.46 | 5.73 | 1.10 | 2.58 | 1.15 | 6.37 | 7.28 | 6.51 | 7.20 | -0.48 | -0.69 | -0.02 | 2.56 | 3.76 | 5.33 | -0.16 | -1.18 | -1.18 | -2.16 | -1.06 | -0.19 | -0.48 | -2.35 | -3.16 | 2.79 | 0.72 | 2.14 | 4.80 | 4.84 | 4.55 | 1.27 | 4.26 | 5.23 | 4.40 | 5.43 | 4.56 | 2.32 | 0.03 | 2.15 | 3.22 | 5.06 | 5.28 | 3.80 | 1.38 | 1.58 | 0.98 | 2.27 | 2.94 | 3.39 | 0.33 | -0.53 | 0.17 | 0.53 | 0.57 | 1.69 | -2.21 | -0.76 | 1.25 | 0.49 | 6.49 | 8.84 | -4.02 | -1.33 | 7.42 | 0.71 | 2.81 | 2.03 | 3.30 | 3.00 | -0.24 | 7.02 | 3.82 | 4.66 | 3.36 | -1.18 | 3.82 | 4.91 | -2.95 | -2.89 | -2.43 | -2.05 | 4.25 | 7.02 | 7.36 | 0.14 | 6.57 | 6.68 | 8.12 |
| 936 | 936 | -0.19 | -0.34 | 8.54 | -0.62 | 7.01 | 5.95 | 4.24 | 5.41 | -1.89 | -2.84 | -3.38 | -3.11 | -2.94 | -1.45 | -3.83 | -4.43 | -5.61 | -5.41 | -5.54 | -2.94 | -4.78 | -6.06 | -5.88 | -4.70 | -4.82 | -4.46 | -2.66 | -3.82 | 2.85 | -2.70 | -5.16 | -5.47 | -3.31 | -5.61 | -2.80 | -4.11 | -4.97 | -4.86 | -5.01 | -3.63 | -3.78 | -5.29 | -4.17 | -2.49 | -3.65 | 5.31 | 5.60 | 9.87 | 6.67 | 8.05 | 12.33 | 6.72 | 6.42 | 1.25 | 7.28 | 7.37 | 1.99 | 6.28 | 1.17 | 0.50 | 4.52 | 4.43 | 5.54 | 1.30 | 3.08 | -2.92 | -2.16 | -3.18 | -1.66 | -1.63 | 4.55 | 0.53 | 5.73 | -3.27 | 1.30 | 1.70 | -2.57 | 4.53 | 4.14 | 2.61 | -0.18 | 8.32 | 9.62 | 4.82 | 1.58 | 1.99 | 5.82 | 8.59 | 7.58 | 8.39 | 4.68 | 0.36 | 2.01 | -0.31 | 5.71 | 7.35 | 6.22 | 6.66 | -0.70 | -1.42 | -0.62 | 2.13 | 3.54 | 4.85 | -0.72 | -1.53 | -2.04 | -2.37 | -1.38 | -0.96 | -1.57 | -2.91 | -3.60 | 2.37 | 0.21 | 0.92 | 4.05 | 4.27 | 4.33 | 0.24 | 3.38 | 4.45 | 3.71 | 4.74 | 4.44 | 1.51 | -1.73 | 1.51 | 2.27 | 4.37 | 4.89 | 3.40 | 0.66 | 0.83 | 0.27 | 1.50 | 2.30 | 2.60 | 0.14 | -0.90 | -0.99 | -0.53 | -0.30 | 1.14 | -3.06 | -1.69 | 0.39 | 0.19 | 6.21 | 8.05 | -2.75 | -0.87 | 7.79 | 0.87 | 2.48 | 1.62 | 3.28 | 2.93 | -0.41 | 6.91 | 3.75 | 4.38 | 3.20 | -1.07 | 3.81 | 4.89 | -3.36 | -2.40 | -2.06 | -2.03 | 3.99 | 7.36 | 6.94 | 0.14 | 6.26 | 6.47 | 7.98 |
| 788 | 788 | -1.96 | -0.35 | 8.73 | -0.80 | 6.90 | 5.95 | 4.88 | 5.39 | -1.55 | -2.45 | -3.51 | -2.84 | -2.83 | -1.71 | -3.91 | -4.05 | -5.61 | -4.63 | -5.29 | -3.51 | -4.86 | -5.97 | -5.27 | -4.90 | -4.40 | -4.63 | -3.11 | -3.99 | 2.87 | -2.23 | -4.61 | -5.04 | -3.53 | -5.08 | -3.02 | -4.41 | -4.72 | -5.18 | -4.72 | -3.63 | -3.61 | -5.29 | -4.05 | -2.31 | -3.73 | 4.69 | 5.31 | 9.69 | 6.76 | 8.21 | 12.18 | 6.75 | 6.51 | 1.15 | 7.38 | 7.93 | 1.76 | 5.68 | -0.02 | -0.65 | 4.14 | 3.36 | 4.43 | 0.21 | 1.98 | -2.31 | -1.54 | -2.30 | -1.66 | -1.47 | 4.48 | 0.88 | 6.47 | -2.50 | 0.74 | 1.12 | -2.17 | 4.31 | 3.50 | 2.09 | -0.60 | 8.06 | 9.69 | 3.99 | 0.54 | 1.60 | 5.60 | 8.71 | 7.32 | 8.03 | 3.27 | -0.98 | 1.59 | -0.20 | 5.68 | 7.16 | 5.57 | 6.16 | -0.79 | -1.31 | -0.87 | 2.17 | 3.23 | 4.57 | -0.93 | -1.80 | -2.27 | -2.51 | -1.74 | -1.02 | -1.92 | -2.02 | -3.79 | 1.95 | -0.24 | 0.40 | 3.73 | 4.13 | 3.71 | 0.03 | 2.89 | 4.06 | 3.54 | 4.76 | 3.88 | 0.53 | -2.11 | 1.27 | 1.99 | 4.13 | 4.58 | 2.88 | 0.22 | 0.39 | 0.22 | 1.44 | 2.02 | 2.22 | 0.00 | -0.81 | -1.10 | -0.41 | -0.09 | 1.00 | -2.66 | -1.55 | 0.33 | 0.19 | 6.47 | 7.89 | -4.40 | -1.94 | 7.65 | 0.38 | 1.66 | 0.84 | 2.78 | 2.26 | -0.84 | 6.52 | 3.53 | 3.81 | 2.83 | -1.69 | 3.65 | 4.47 | -3.81 | -2.97 | -2.88 | -2.29 | 3.88 | 6.99 | 7.38 | -0.10 | 6.00 | 6.52 | 8.04 |
| 698 | 698 | -1.90 | -0.63 | 8.24 | -0.46 | 6.94 | 5.42 | 4.70 | 4.62 | -1.78 | -3.14 | -3.46 | -2.90 | -2.94 | -1.65 | -4.20 | -4.56 | -5.68 | -5.61 | -5.41 | -2.92 | -5.04 | -5.97 | -6.06 | -4.90 | -4.22 | -4.20 | -3.05 | -4.61 | 2.15 | -2.87 | -4.68 | -5.08 | -3.69 | -5.24 | -3.63 | -4.24 | -5.16 | -5.35 | -4.97 | -3.61 | -3.99 | -5.35 | -3.98 | -2.59 | -3.95 | 5.15 | 5.82 | 10.00 | 5.54 | 8.15 | 12.28 | 6.80 | 6.23 | 1.88 | 7.07 | 7.38 | 2.06 | 6.79 | 1.67 | 1.00 | 4.79 | 4.79 | 5.71 | 1.99 | 3.29 | -2.13 | -1.01 | -1.85 | -1.23 | -0.90 | 4.41 | -0.02 | 6.09 | -2.10 | 1.66 | 2.27 | -3.48 | 4.96 | 4.76 | 2.64 | 0.05 | 8.91 | 9.99 | 5.16 | 1.53 | 2.11 | 6.28 | 8.77 | 8.03 | 8.66 | 5.99 | 0.87 | 2.30 | 0.63 | 6.23 | 7.50 | 6.75 | 7.22 | -0.45 | -0.81 | -0.11 | 2.57 | 3.93 | 5.16 | -0.31 | -1.19 | -1.25 | -1.93 | -0.89 | 0.07 | -0.87 | -1.12 | -3.03 | 2.61 | 0.54 | 1.83 | 4.50 | 4.53 | 4.42 | 1.15 | 4.02 | 4.91 | 4.06 | 5.06 | 4.42 | 2.02 | -1.03 | 1.87 | 2.96 | 4.84 | 5.08 | 3.62 | 1.13 | 1.23 | 0.75 | 2.26 | 2.80 | 3.04 | 0.41 | -0.39 | 0.02 | 0.31 | 0.52 | 1.73 | -2.28 | -0.73 | 1.06 | 0.72 | 6.44 | 7.27 | -3.08 | -1.23 | 7.35 | 0.92 | 2.60 | 2.00 | 3.69 | 3.20 | -0.25 | 7.38 | 4.15 | 5.00 | 3.88 | -1.39 | 4.31 | 5.20 | -3.47 | -2.75 | -1.97 | -1.96 | 4.18 | 6.81 | 6.75 | 0.02 | 6.49 | 5.97 | 7.78 |
#removing any NA, might be problematic but hard to impute completely
selected_metabolomics_data <- selected_metabolomics_data %>% na.omit()
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[ trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who ~ ., train_data)[,-1]
y_train <- train_data$hs_zbmi_who
x_test <- model.matrix(hs_zbmi_who ~ ., test_data)[,-1]
y_test <- test_data$hs_zbmi_who
lasso_model <- cv.glmnet(x_train, y_train, alpha = 1, family = "gaussian")
plot(lasso_model)
lasso_model$lambda.min
## [1] 0.006317593
coef(lasso_model, s = lasso_model$lambda.min)
## 240 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 9.652612753
## hs_child_age_None -0.125015988
## h_cohort2 -0.178612364
## h_cohort3 0.239984197
## h_cohort4 0.277482344
## h_cohort5 .
## h_cohort6 0.076084251
## e3_sex_Nonemale 0.337746920
## e3_yearbir_None2004 -0.109978813
## e3_yearbir_None2005 -0.037950566
## e3_yearbir_None2006 0.041309884
## e3_yearbir_None2007 0.070061649
## e3_yearbir_None2008 .
## e3_yearbir_None2009 0.440353330
## h_edumc_None2 .
## h_edumc_None3 0.054730974
## h_native_None1 .
## h_native_None2 0.030640414
## hs_cd_c_Log2 0.004233189
## hs_co_c_Log2 .
## hs_cs_c_Log2 0.106471288
## hs_cu_c_Log2 0.161959408
## hs_hg_c_Log2 -0.047861012
## hs_mo_c_Log2 -0.047670350
## hs_pb_c_Log2 .
## hs_dde_cadj_Log2 -0.018322367
## hs_pcb153_cadj_Log2 -0.223070623
## hs_pcb170_cadj_Log2 -0.034259805
## hs_dep_cadj_Log2 -0.011178755
## hs_pbde153_cadj_Log2 -0.015997473
## hs_pfhxs_c_Log2 .
## hs_pfoa_c_Log2 -0.031543757
## hs_pfos_c_Log2 0.011634524
## hs_prpa_cadj_Log2 -0.009984648
## hs_mbzp_cadj_Log2 0.054349990
## hs_mibp_cadj_Log2 .
## hs_mnbp_cadj_Log2 -0.001456852
## h_bfdur_Ter(10.8,34.9] 0.097697017
## h_bfdur_Ter(34.9,Inf] 0.123734223
## hs_bakery_prod_Ter(2,6] 0.018896136
## hs_bakery_prod_Ter(6,Inf] -0.085793766
## hs_dairy_Ter(14.6,25.6] -0.003998940
## hs_dairy_Ter(25.6,Inf] 0.072458675
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] -0.020249340
## hs_org_food_Ter(0.132,1] 0.041429106
## hs_org_food_Ter(1,Inf] 0.018144596
## hs_readymade_Ter(0.132,0.5] 0.076617183
## hs_readymade_Ter(0.5,Inf] 0.052949999
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] -0.046255786
## hs_total_fruits_Ter(7,14.1] 0.081684206
## hs_total_fruits_Ter(14.1,Inf] 0.127364350
## hs_total_lipids_Ter(3,7] 0.057620281
## hs_total_lipids_Ter(7,Inf] .
## hs_total_potatoes_Ter(3,4] 0.008046024
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] .
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] 0.026622043
## hs_total_veg_Ter(8.5,Inf] -0.005827837
## metab_1 -0.020785897
## metab_2 0.073096038
## metab_3 0.032034926
## metab_4 0.003446253
## metab_5 0.468786494
## metab_6 -0.097649880
## metab_7 .
## metab_8 0.217490779
## metab_9 .
## metab_10 0.032087540
## metab_11 0.184273933
## metab_12 -0.149435967
## metab_13 .
## metab_14 -0.434506087
## metab_15 .
## metab_16 .
## metab_17 -0.007565700
## metab_18 -0.186515334
## metab_19 .
## metab_20 .
## metab_21 0.027549314
## metab_22 -0.244341002
## metab_23 0.146432710
## metab_24 0.632567367
## metab_25 -0.130026099
## metab_26 -0.237944780
## metab_27 0.521933363
## metab_28 .
## metab_29 -0.019989581
## metab_30 0.153011592
## metab_31 0.053300669
## metab_32 -0.142487713
## metab_33 .
## metab_34 -0.011121071
## metab_35 .
## metab_36 .
## metab_37 -0.035444486
## metab_38 -0.053022833
## metab_39 -0.003334899
## metab_40 0.044520450
## metab_41 0.249852509
## metab_42 -0.405907450
## metab_43 -0.158581284
## metab_44 -0.045288800
## metab_45 0.130376443
## metab_46 .
## metab_47 0.457616223
## metab_48 -0.783285866
## metab_49 0.117109155
## metab_50 -0.188337510
## metab_51 .
## metab_52 0.443884351
## metab_53 .
## metab_54 0.121219769
## metab_55 .
## metab_56 -0.136020552
## metab_57 .
## metab_58 .
## metab_59 0.603780730
## metab_60 -0.154424356
## metab_61 .
## metab_62 .
## metab_63 -0.161019814
## metab_64 .
## metab_65 .
## metab_66 -0.084873517
## metab_67 -0.263063976
## metab_68 0.137153805
## metab_69 -0.067164949
## metab_70 .
## metab_71 -0.066501000
## metab_72 .
## metab_73 -0.123044461
## metab_74 .
## metab_75 0.305359360
## metab_76 .
## metab_77 0.013700114
## metab_78 -0.127952913
## metab_79 .
## metab_80 .
## metab_81 .
## metab_82 -0.658204717
## metab_83 .
## metab_84 -0.099522817
## metab_85 .
## metab_86 0.377147684
## metab_87 0.050659558
## metab_88 0.627678593
## metab_89 -1.326499919
## metab_90 .
## metab_91 0.135298705
## metab_92 0.104189432
## metab_93 .
## metab_94 -0.058806180
## metab_95 1.648017671
## metab_96 0.013536044
## metab_97 .
## metab_98 .
## metab_99 -0.471488065
## metab_100 0.600253840
## metab_101 .
## metab_102 .
## metab_103 -0.465663773
## metab_104 0.142828557
## metab_105 0.137140807
## metab_106 0.103884323
## metab_107 0.049901842
## metab_108 .
## metab_109 -0.260892420
## metab_110 -0.165457401
## metab_111 .
## metab_112 .
## metab_113 0.659426190
## metab_114 .
## metab_115 0.525579174
## metab_116 .
## metab_117 .
## metab_118 -0.372785082
## metab_119 .
## metab_120 -0.263488288
## metab_121 .
## metab_122 .
## metab_123 .
## metab_124 .
## metab_125 -0.191346026
## metab_126 .
## metab_127 -0.023853642
## metab_128 -0.001830550
## metab_129 .
## metab_130 .
## metab_131 .
## metab_132 .
## metab_133 -0.294838104
## metab_134 0.359310084
## metab_135 -0.264462850
## metab_136 .
## metab_137 -0.389533069
## metab_138 -0.523098646
## metab_139 .
## metab_140 -0.002059593
## metab_141 .
## metab_142 -0.621749560
## metab_143 -0.267165989
## metab_144 0.014590054
## metab_145 -0.191413185
## metab_146 -0.007204806
## metab_147 0.547810987
## metab_148 .
## metab_149 .
## metab_150 0.277334322
## metab_151 -0.005555498
## metab_152 -0.046231938
## metab_153 .
## metab_154 -0.017119716
## metab_155 -0.374027935
## metab_156 .
## metab_157 0.201783156
## metab_158 .
## metab_159 0.131018354
## metab_160 -2.124408269
## metab_161 2.446554446
## metab_162 .
## metab_163 0.597766767
## metab_164 -0.089949931
## metab_165 .
## metab_166 -0.387780142
## metab_167 -0.090688458
## metab_168 -0.008939408
## metab_169 .
## metab_170 .
## metab_171 -0.084162368
## metab_172 .
## metab_173 0.065001642
## metab_174 .
## metab_175 -0.222972504
## metab_176 -0.061410628
## metab_177 0.137079628
lasso_predictions <- predict(lasso_model, s = lasso_model$lambda.min, newx = x_test)
test_mse <- mean((lasso_predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.7397599
# convert hs_zbmi_who to binary based on median
median_value <- median(selected_metabolomics_data$hs_zbmi_who, na.rm = TRUE)
selected_metabolomics_data$hs_zbmi_who_binary <- ifelse(selected_metabolomics_data$hs_zbmi_who > median_value, 1, 0)
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who_binary, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, train_data)[,-1]
y_train <- train_data$hs_zbmi_who_binary
x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, test_data)[,-1]
y_test <- test_data$hs_zbmi_who_binary
# fit LASSO model using cross-validation
lasso_model <- cv.glmnet(x_train, y_train, alpha = 1, family = "binomial")
plot(lasso_model)
best_lambda <- lasso_model$lambda.min
cat("Best Lambda:", best_lambda, "\n")
## Best Lambda: 0.01054902
# Get coefficients at best lambda
coef(lasso_model, s = best_lambda)
## 240 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 7.937250e+00
## hs_child_age_None -7.322684e-02
## h_cohort2 -3.980672e-01
## h_cohort3 .
## h_cohort4 .
## h_cohort5 .
## h_cohort6 .
## e3_sex_Nonemale 2.057458e-01
## e3_yearbir_None2004 -3.259318e-01
## e3_yearbir_None2005 -2.010076e-02
## e3_yearbir_None2006 5.158542e-02
## e3_yearbir_None2007 .
## e3_yearbir_None2008 .
## e3_yearbir_None2009 8.923769e-01
## h_edumc_None2 4.762899e-02
## h_edumc_None3 .
## h_native_None1 .
## h_native_None2 4.570936e-01
## hs_cd_c_Log2 .
## hs_co_c_Log2 .
## hs_cs_c_Log2 1.236241e-02
## hs_cu_c_Log2 .
## hs_hg_c_Log2 .
## hs_mo_c_Log2 -2.580655e-02
## hs_pb_c_Log2 .
## hs_dde_cadj_Log2 .
## hs_pcb153_cadj_Log2 -3.012537e-01
## hs_pcb170_cadj_Log2 -1.658879e-02
## hs_dep_cadj_Log2 -2.149976e-02
## hs_pbde153_cadj_Log2 -4.576825e-02
## hs_pfhxs_c_Log2 .
## hs_pfoa_c_Log2 -2.656628e-01
## hs_pfos_c_Log2 .
## hs_prpa_cadj_Log2 .
## hs_mbzp_cadj_Log2 8.874895e-05
## hs_mibp_cadj_Log2 .
## hs_mnbp_cadj_Log2 .
## h_bfdur_Ter(10.8,34.9] 7.954782e-02
## h_bfdur_Ter(34.9,Inf] .
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -2.924093e-01
## hs_dairy_Ter(14.6,25.6] .
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] .
## hs_org_food_Ter(0.132,1] 2.613185e-02
## hs_org_food_Ter(1,Inf] 4.987130e-02
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] .
## hs_total_fruits_Ter(14.1,Inf] 4.616319e-02
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] .
## hs_total_potatoes_Ter(3,4] 1.760234e-05
## hs_total_potatoes_Ter(4,Inf] -7.838245e-02
## hs_total_sweets_Ter(4.1,8.5] .
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] 9.408405e-02
## hs_total_veg_Ter(8.5,Inf] -9.448408e-02
## metab_1 .
## metab_2 .
## metab_3 .
## metab_4 1.181569e-01
## metab_5 8.504437e-01
## metab_6 .
## metab_7 .
## metab_8 1.625507e-01
## metab_9 .
## metab_10 .
## metab_11 .
## metab_12 .
## metab_13 .
## metab_14 .
## metab_15 .
## metab_16 .
## metab_17 .
## metab_18 .
## metab_19 .
## metab_20 .
## metab_21 .
## metab_22 .
## metab_23 4.888186e-02
## metab_24 1.279110e-01
## metab_25 .
## metab_26 -3.322651e-01
## metab_27 .
## metab_28 3.260778e-01
## metab_29 .
## metab_30 9.491181e-02
## metab_31 .
## metab_32 .
## metab_33 .
## metab_34 .
## metab_35 .
## metab_36 5.262996e-01
## metab_37 -9.104158e-02
## metab_38 -3.164769e-01
## metab_39 -4.299547e-03
## metab_40 .
## metab_41 .
## metab_42 .
## metab_43 .
## metab_44 .
## metab_45 .
## metab_46 -6.087660e-02
## metab_47 4.005154e-01
## metab_48 -8.718817e-01
## metab_49 8.155354e-01
## metab_50 -2.389608e-01
## metab_51 .
## metab_52 .
## metab_53 .
## metab_54 1.520904e-03
## metab_55 .
## metab_56 .
## metab_57 .
## metab_58 .
## metab_59 2.708218e-01
## metab_60 .
## metab_61 .
## metab_62 .
## metab_63 .
## metab_64 .
## metab_65 2.117688e-02
## metab_66 .
## metab_67 .
## metab_68 .
## metab_69 .
## metab_70 .
## metab_71 -3.137810e-01
## metab_72 .
## metab_73 .
## metab_74 .
## metab_75 .
## metab_76 .
## metab_77 .
## metab_78 .
## metab_79 .
## metab_80 .
## metab_81 6.815883e-01
## metab_82 -1.178690e+00
## metab_83 .
## metab_84 .
## metab_85 .
## metab_86 .
## metab_87 .
## metab_88 .
## metab_89 .
## metab_90 .
## metab_91 .
## metab_92 .
## metab_93 .
## metab_94 .
## metab_95 1.888814e+00
## metab_96 .
## metab_97 .
## metab_98 .
## metab_99 .
## metab_100 .
## metab_101 .
## metab_102 .
## metab_103 .
## metab_104 3.133895e-01
## metab_105 .
## metab_106 .
## metab_107 .
## metab_108 .
## metab_109 .
## metab_110 .
## metab_111 .
## metab_112 .
## metab_113 4.514091e-01
## metab_114 .
## metab_115 1.881794e-01
## metab_116 -2.287344e-02
## metab_117 .
## metab_118 -1.045499e+00
## metab_119 .
## metab_120 -4.159841e-02
## metab_121 .
## metab_122 -7.894047e-01
## metab_123 -1.010749e-02
## metab_124 .
## metab_125 .
## metab_126 .
## metab_127 -1.481293e-01
## metab_128 .
## metab_129 .
## metab_130 .
## metab_131 .
## metab_132 .
## metab_133 -6.203186e-01
## metab_134 .
## metab_135 .
## metab_136 .
## metab_137 .
## metab_138 .
## metab_139 .
## metab_140 .
## metab_141 .
## metab_142 -8.472082e-02
## metab_143 .
## metab_144 .
## metab_145 -5.315799e-01
## metab_146 -1.384647e-01
## metab_147 .
## metab_148 .
## metab_149 .
## metab_150 .
## metab_151 6.612960e-02
## metab_152 -7.002718e-02
## metab_153 .
## metab_154 .
## metab_155 .
## metab_156 .
## metab_157 .
## metab_158 .
## metab_159 .
## metab_160 -2.797393e+00
## metab_161 3.983119e+00
## metab_162 .
## metab_163 2.278074e-02
## metab_164 .
## metab_165 .
## metab_166 .
## metab_167 .
## metab_168 .
## metab_169 .
## metab_170 -1.286578e-01
## metab_171 .
## metab_172 -3.563937e-01
## metab_173 6.535921e-01
## metab_174 .
## metab_175 .
## metab_176 .
## metab_177 1.311648e-02
lasso_predictions <- predict(lasso_model, s = best_lambda, newx = x_test, type = "response")
# convert probabilities to binary predictions
binary_predictions <- ifelse(lasso_predictions > 0.5, 1, 0)
# make sure levels match between binary_predictions and y_test
binary_predictions <- factor(binary_predictions, levels = c(0, 1))
y_test <- factor(y_test, levels = c(0, 1))
# evaluate accuracy
accuracy <- mean(binary_predictions == y_test)
cat("LASSO Accuracy on Test Set:", accuracy, "\n")
## LASSO Accuracy on Test Set: 0.7402235
conf_matrix <- confusionMatrix(binary_predictions, y_test)
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 133 47
## 1 46 132
##
## Accuracy : 0.7402
## 95% CI : (0.6915, 0.7849)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.4804
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.7430
## Specificity : 0.7374
## Pos Pred Value : 0.7389
## Neg Pred Value : 0.7416
## Prevalence : 0.5000
## Detection Rate : 0.3715
## Detection Prevalence : 0.5028
## Balanced Accuracy : 0.7402
##
## 'Positive' Class : 0
##
# ROC Curve and AUC
roc_curve <- roc(as.numeric(y_test), as.numeric(lasso_predictions))
## Setting levels: control = 1, case = 2
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for LASSO Model")
auc_value <- auc(roc_curve)
cat("LASSO AUC on Test Set:", auc_value, "\n")
## LASSO AUC on Test Set: 0.8037827
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[ trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who ~ . - hs_zbmi_who_binary, train_data)[,-1]
y_train <- train_data$hs_zbmi_who
x_test <- model.matrix(hs_zbmi_who ~ . - hs_zbmi_who_binary, test_data)[,-1]
y_test <- test_data$hs_zbmi_who
ridge_model <- cv.glmnet(x_train, y_train, alpha = 0, family = "gaussian")
plot(ridge_model)
ridge_model$lambda.min
## [1] 0.1053838
coef(ridge_model, s = ridge_model$lambda.min)
## 240 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 2.4494205140
## hs_child_age_None -0.0807562390
## h_cohort2 -0.2322533256
## h_cohort3 0.2148947058
## h_cohort4 0.2742479627
## h_cohort5 0.0305044324
## h_cohort6 0.1345792853
## e3_sex_Nonemale 0.2762535283
## e3_yearbir_None2004 -0.1307669542
## e3_yearbir_None2005 -0.0662549700
## e3_yearbir_None2006 0.0488073797
## e3_yearbir_None2007 0.0874472743
## e3_yearbir_None2008 0.0140599433
## e3_yearbir_None2009 0.5262950664
## h_edumc_None2 0.0209774799
## h_edumc_None3 0.0770561650
## h_native_None1 -0.0103178724
## h_native_None2 0.0699425628
## hs_cd_c_Log2 0.0077592299
## hs_co_c_Log2 0.0051414013
## hs_cs_c_Log2 0.1047753386
## hs_cu_c_Log2 0.2188135066
## hs_hg_c_Log2 -0.0503031779
## hs_mo_c_Log2 -0.0538732196
## hs_pb_c_Log2 -0.0140226284
## hs_dde_cadj_Log2 -0.0356320943
## hs_pcb153_cadj_Log2 -0.1979003596
## hs_pcb170_cadj_Log2 -0.0399381366
## hs_dep_cadj_Log2 -0.0117487705
## hs_pbde153_cadj_Log2 -0.0168898180
## hs_pfhxs_c_Log2 0.0087279115
## hs_pfoa_c_Log2 -0.0551506976
## hs_pfos_c_Log2 0.0119391752
## hs_prpa_cadj_Log2 -0.0087934187
## hs_mbzp_cadj_Log2 0.0525129143
## hs_mibp_cadj_Log2 0.0065806367
## hs_mnbp_cadj_Log2 -0.0185313521
## h_bfdur_Ter(10.8,34.9] 0.1157904148
## h_bfdur_Ter(34.9,Inf] 0.1561869467
## hs_bakery_prod_Ter(2,6] 0.0287020411
## hs_bakery_prod_Ter(6,Inf] -0.0989934167
## hs_dairy_Ter(14.6,25.6] -0.0278652010
## hs_dairy_Ter(25.6,Inf] 0.0701097207
## hs_fastfood_Ter(0.132,0.5] -0.0212141982
## hs_fastfood_Ter(0.5,Inf] -0.0455694841
## hs_org_food_Ter(0.132,1] 0.0479111617
## hs_org_food_Ter(1,Inf] 0.0297347170
## hs_readymade_Ter(0.132,0.5] 0.1098733269
## hs_readymade_Ter(0.5,Inf] 0.0781587855
## hs_total_bread_Ter(7,17.5] -0.0161375308
## hs_total_bread_Ter(17.5,Inf] -0.0141217088
## hs_total_fish_Ter(1.5,3] -0.0227684885
## hs_total_fish_Ter(3,Inf] -0.0576500860
## hs_total_fruits_Ter(7,14.1] 0.1097538366
## hs_total_fruits_Ter(14.1,Inf] 0.1423782091
## hs_total_lipids_Ter(3,7] 0.0867512469
## hs_total_lipids_Ter(7,Inf] 0.0164799505
## hs_total_potatoes_Ter(3,4] 0.0282635691
## hs_total_potatoes_Ter(4,Inf] 0.0082718606
## hs_total_sweets_Ter(4.1,8.5] -0.0176421543
## hs_total_sweets_Ter(8.5,Inf] -0.0043193424
## hs_total_veg_Ter(6,8.5] 0.0329623256
## hs_total_veg_Ter(8.5,Inf] -0.0201305477
## metab_1 -0.0356324575
## metab_2 0.3093081048
## metab_3 0.1068203955
## metab_4 0.0198384064
## metab_5 0.4048546629
## metab_6 -0.1683142152
## metab_7 0.0756138627
## metab_8 0.3457958796
## metab_9 -0.0511431874
## metab_10 0.0750736543
## metab_11 0.1801856557
## metab_12 -0.1297303683
## metab_13 -0.0398107305
## metab_14 -0.5140013662
## metab_15 -0.0160423797
## metab_16 0.0319625514
## metab_17 -0.0438826380
## metab_18 -0.1818246007
## metab_19 -0.0629747201
## metab_20 0.0313230948
## metab_21 0.2557173994
## metab_22 -0.2593831265
## metab_23 0.1489300876
## metab_24 0.6589167008
## metab_25 -0.1425958734
## metab_26 -0.2566960932
## metab_27 0.3825491134
## metab_28 0.0619920398
## metab_29 -0.0986542491
## metab_30 0.1333568559
## metab_31 0.0673468991
## metab_32 -0.1394744288
## metab_33 0.0147626931
## metab_34 -0.0619354942
## metab_35 -0.0280958420
## metab_36 -0.0290727624
## metab_37 -0.0955180595
## metab_38 -0.0532691604
## metab_39 -0.0104293549
## metab_40 0.3513759032
## metab_41 0.2664273334
## metab_42 -0.4284613382
## metab_43 -0.1853713484
## metab_44 -0.1136538680
## metab_45 0.1802531706
## metab_46 -0.0625154848
## metab_47 0.4363898328
## metab_48 -0.6412412908
## metab_49 0.1311827280
## metab_50 -0.2540788244
## metab_51 0.0342328965
## metab_52 0.5012879046
## metab_53 0.0468728997
## metab_54 0.1158494094
## metab_55 0.0123602953
## metab_56 -0.1924275322
## metab_57 0.0417974880
## metab_58 -0.1537668957
## metab_59 0.4846343980
## metab_60 -0.1542577290
## metab_61 0.0953787222
## metab_62 -0.0471840293
## metab_63 -0.1382679312
## metab_64 0.0530170053
## metab_65 0.0204393397
## metab_66 -0.1330236555
## metab_67 -0.1403191179
## metab_68 0.1283040339
## metab_69 -0.0866717093
## metab_70 -0.0490907794
## metab_71 -0.1250459783
## metab_72 -0.0274823801
## metab_73 -0.1219595478
## metab_74 0.0329537071
## metab_75 0.3019373615
## metab_76 -0.0260404206
## metab_77 0.0094595840
## metab_78 -0.2759417047
## metab_79 0.0162114356
## metab_80 0.0406970885
## metab_81 0.1374685570
## metab_82 -0.4606395602
## metab_83 -0.0960320149
## metab_84 -0.1839203868
## metab_85 -0.0182163420
## metab_86 0.3022951567
## metab_87 0.1261314125
## metab_88 0.4018965809
## metab_89 -0.4060073888
## metab_90 -0.0491506273
## metab_91 0.1519519407
## metab_92 0.1169152959
## metab_93 -0.0240086358
## metab_94 -0.0807376606
## metab_95 0.8113455686
## metab_96 0.3362727238
## metab_97 -0.1870766929
## metab_98 -0.0494372877
## metab_99 -0.4833755828
## metab_100 0.4310798058
## metab_101 0.1084737388
## metab_102 0.0601884712
## metab_103 -0.2634210926
## metab_104 0.2157663551
## metab_105 0.1381333203
## metab_106 0.1075218916
## metab_107 0.1282239294
## metab_108 -0.0008768251
## metab_109 -0.1914029546
## metab_110 -0.2475460177
## metab_111 -0.0943473483
## metab_112 0.0559648830
## metab_113 0.5506388048
## metab_114 0.0589562048
## metab_115 0.4357448367
## metab_116 0.0502224951
## metab_117 -0.1797498640
## metab_118 -0.1849170592
## metab_119 0.1100270441
## metab_120 -0.3318540971
## metab_121 0.1132077882
## metab_122 -0.2645520339
## metab_123 -0.1204987031
## metab_124 0.0500019849
## metab_125 -0.2439620063
## metab_126 0.0254666574
## metab_127 -0.0266735077
## metab_128 -0.0769171410
## metab_129 0.1023250491
## metab_130 -0.1710254468
## metab_131 -0.0242519209
## metab_132 0.0352952669
## metab_133 -0.2934894625
## metab_134 0.2500414946
## metab_135 -0.1544992765
## metab_136 -0.1803280878
## metab_137 -0.3504019669
## metab_138 -0.3525008089
## metab_139 -0.0266390329
## metab_140 -0.0846205257
## metab_141 -0.0803517507
## metab_142 -0.3691865186
## metab_143 -0.2479707334
## metab_144 0.1602881185
## metab_145 -0.2560717174
## metab_146 -0.0366559517
## metab_147 0.3502054310
## metab_148 0.0402649223
## metab_149 0.0504084491
## metab_150 0.2507921258
## metab_151 0.0070653444
## metab_152 -0.0520637190
## metab_153 -0.0470324841
## metab_154 -0.0285517715
## metab_155 -0.2948219993
## metab_156 -0.0657415551
## metab_157 0.1562791906
## metab_158 0.1390909401
## metab_159 0.2112649535
## metab_160 -1.0462789923
## metab_161 1.4450351855
## metab_162 -0.0291513560
## metab_163 0.7390823658
## metab_164 -0.1151060261
## metab_165 -0.0370357382
## metab_166 -0.3800799041
## metab_167 -0.1098805922
## metab_168 -0.0706141033
## metab_169 0.0280169693
## metab_170 -0.0071393153
## metab_171 -0.0617738843
## metab_172 -0.0299629702
## metab_173 0.1259658477
## metab_174 -0.0806184771
## metab_175 -0.2089224854
## metab_176 -0.0844220296
## metab_177 0.1678561164
predictions <- predict(ridge_model, s = ridge_model$lambda.min, newx = x_test)
test_mse <- mean((predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.731053
# convert hs_zbmi_who to binary based on median
median_value <- median(selected_metabolomics_data$hs_zbmi_who, na.rm = TRUE)
selected_metabolomics_data$hs_zbmi_who_binary <- ifelse(selected_metabolomics_data$hs_zbmi_who > median_value, 1, 0)
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who_binary, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, train_data)[,-1]
y_train <- train_data$hs_zbmi_who_binary
x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, test_data)[,-1]
y_test <- test_data$hs_zbmi_who_binary
# fit ridge model using cross-validation
ridge_model <- cv.glmnet(x_train, y_train, alpha = 0, family = "binomial")
plot(ridge_model)
best_lambda <- ridge_model$lambda.min
cat("Best Lambda:", best_lambda, "\n")
## Best Lambda: 0.09175002
coef(ridge_model, s = best_lambda)
## 240 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -3.1290728529
## hs_child_age_None -0.0586481056
## h_cohort2 -0.2663041638
## h_cohort3 0.1435932560
## h_cohort4 0.1969858711
## h_cohort5 0.1158459554
## h_cohort6 0.1780244278
## e3_sex_Nonemale 0.1230039748
## e3_yearbir_None2004 -0.3124975610
## e3_yearbir_None2005 -0.0944057219
## e3_yearbir_None2006 0.0559329921
## e3_yearbir_None2007 0.1043983171
## e3_yearbir_None2008 0.0964413750
## e3_yearbir_None2009 0.9837058948
## h_edumc_None2 0.1641793864
## h_edumc_None3 0.0830119438
## h_native_None1 0.0004119497
## h_native_None2 0.3150207915
## hs_cd_c_Log2 -0.0195166972
## hs_co_c_Log2 0.0801221848
## hs_cs_c_Log2 0.1054396159
## hs_cu_c_Log2 0.1705679267
## hs_hg_c_Log2 -0.0161756218
## hs_mo_c_Log2 -0.0754056982
## hs_pb_c_Log2 -0.0410281648
## hs_dde_cadj_Log2 -0.0583766290
## hs_pcb153_cadj_Log2 -0.2217363771
## hs_pcb170_cadj_Log2 -0.0375230172
## hs_dep_cadj_Log2 -0.0248423449
## hs_pbde153_cadj_Log2 -0.0388870871
## hs_pfhxs_c_Log2 0.0180825617
## hs_pfoa_c_Log2 -0.2412180321
## hs_pfos_c_Log2 -0.0165278019
## hs_prpa_cadj_Log2 0.0008529876
## hs_mbzp_cadj_Log2 0.0383964223
## hs_mibp_cadj_Log2 -0.0095597323
## hs_mnbp_cadj_Log2 -0.0570776451
## h_bfdur_Ter(10.8,34.9] 0.1769788276
## h_bfdur_Ter(34.9,Inf] 0.1206273136
## hs_bakery_prod_Ter(2,6] 0.0017410711
## hs_bakery_prod_Ter(6,Inf] -0.2612067971
## hs_dairy_Ter(14.6,25.6] -0.0123591313
## hs_dairy_Ter(25.6,Inf] -0.0174048131
## hs_fastfood_Ter(0.132,0.5] -0.0947679355
## hs_fastfood_Ter(0.5,Inf] -0.0549241633
## hs_org_food_Ter(0.132,1] 0.1136167763
## hs_org_food_Ter(1,Inf] 0.1682163295
## hs_readymade_Ter(0.132,0.5] 0.1215917587
## hs_readymade_Ter(0.5,Inf] 0.0535259119
## hs_total_bread_Ter(7,17.5] -0.1214821627
## hs_total_bread_Ter(17.5,Inf] -0.0634722347
## hs_total_fish_Ter(1.5,3] -0.0053614464
## hs_total_fish_Ter(3,Inf] -0.0380008278
## hs_total_fruits_Ter(7,14.1] 0.0231411035
## hs_total_fruits_Ter(14.1,Inf] 0.1231840389
## hs_total_lipids_Ter(3,7] 0.0301494359
## hs_total_lipids_Ter(7,Inf] -0.0229767438
## hs_total_potatoes_Ter(3,4] 0.0385652000
## hs_total_potatoes_Ter(4,Inf] -0.0908046511
## hs_total_sweets_Ter(4.1,8.5] 0.0495514872
## hs_total_sweets_Ter(8.5,Inf] 0.0499303412
## hs_total_veg_Ter(6,8.5] 0.1405000191
## hs_total_veg_Ter(8.5,Inf] -0.1473289785
## metab_1 0.0115176602
## metab_2 0.2861279889
## metab_3 0.0833757200
## metab_4 0.0833701720
## metab_5 0.5884114064
## metab_6 -0.1674709537
## metab_7 0.0762856038
## metab_8 0.4063369557
## metab_9 -0.1142505123
## metab_10 0.1093658778
## metab_11 -0.1747928560
## metab_12 -0.1546406451
## metab_13 -0.0549921435
## metab_14 -0.0869963107
## metab_15 0.0232990794
## metab_16 0.0819369516
## metab_17 -0.1240927578
## metab_18 -0.0090787386
## metab_19 0.0060151461
## metab_20 -0.0749172082
## metab_21 0.2448413297
## metab_22 -0.1869099173
## metab_23 0.2173737052
## metab_24 0.5717082444
## metab_25 0.0107053835
## metab_26 -0.2987322578
## metab_27 0.1399967511
## metab_28 0.3576024408
## metab_29 -0.0776832275
## metab_30 0.1872533756
## metab_31 0.0117049438
## metab_32 -0.0852185808
## metab_33 -0.0449736726
## metab_34 0.0795776267
## metab_35 -0.0603628939
## metab_36 0.4117549638
## metab_37 -0.1837769377
## metab_38 -0.2864884780
## metab_39 -0.1685351453
## metab_40 0.4079876970
## metab_41 0.1611417732
## metab_42 -0.1917172916
## metab_43 -0.2126068236
## metab_44 -0.0888143608
## metab_45 0.0454526136
## metab_46 -0.2358128748
## metab_47 0.4284773612
## metab_48 -0.7091955362
## metab_49 0.4912595738
## metab_50 -0.3074575509
## metab_51 0.2457988608
## metab_52 0.3510731351
## metab_53 0.1284500656
## metab_54 0.2649541314
## metab_55 0.1060371324
## metab_56 -0.1037894614
## metab_57 0.2087599074
## metab_58 -0.0391098920
## metab_59 0.2545021184
## metab_60 -0.1238079824
## metab_61 0.2045150013
## metab_62 -0.0558297071
## metab_63 -0.0578015302
## metab_64 0.1372523796
## metab_65 0.1068148068
## metab_66 -0.0560738506
## metab_67 -0.0885504372
## metab_68 0.0559271234
## metab_69 0.0323735953
## metab_70 0.0737419387
## metab_71 -0.3565062531
## metab_72 -0.0204269619
## metab_73 -0.1764151392
## metab_74 -0.0375749523
## metab_75 0.2934356894
## metab_76 -0.1358354294
## metab_77 0.0028609744
## metab_78 -0.1822954707
## metab_79 0.0128901575
## metab_80 0.1367655608
## metab_81 0.4514590233
## metab_82 -0.4195053256
## metab_83 -0.2226774898
## metab_84 -0.1189751089
## metab_85 0.0729523484
## metab_86 0.0355370361
## metab_87 0.1655933448
## metab_88 0.2870479032
## metab_89 -0.1363376335
## metab_90 -0.1667274159
## metab_91 0.0830913441
## metab_92 0.1253420831
## metab_93 -0.1735514917
## metab_94 -0.0439978886
## metab_95 0.7292625735
## metab_96 0.2833287294
## metab_97 -0.0083098327
## metab_98 -0.0575887960
## metab_99 -0.1386461788
## metab_100 0.2378170209
## metab_101 0.0426243034
## metab_102 0.2191330260
## metab_103 0.1866341219
## metab_104 0.2763372587
## metab_105 0.0984997675
## metab_106 0.0328896314
## metab_107 0.1613021683
## metab_108 0.2529959282
## metab_109 -0.0802225570
## metab_110 -0.1381237953
## metab_111 -0.2552546502
## metab_112 0.0429241595
## metab_113 0.4407947243
## metab_114 0.2838332373
## metab_115 0.3360216746
## metab_116 -0.2440569379
## metab_117 -0.2463183538
## metab_118 -0.3776606445
## metab_119 -0.1158462092
## metab_120 -0.2979398104
## metab_121 -0.1861150168
## metab_122 -0.3385493562
## metab_123 -0.2744617637
## metab_124 -0.1482200696
## metab_125 -0.0574693311
## metab_126 -0.0504192509
## metab_127 -0.1130907830
## metab_128 -0.2014293379
## metab_129 0.0436769002
## metab_130 -0.4066108734
## metab_131 -0.1014991614
## metab_132 0.0115365688
## metab_133 -0.4496445630
## metab_134 0.1808691855
## metab_135 0.0382172341
## metab_136 -0.2553028557
## metab_137 0.0481165040
## metab_138 -0.0332786021
## metab_139 0.1838214411
## metab_140 -0.1436916780
## metab_141 -0.1643696043
## metab_142 -0.3210381940
## metab_143 -0.2046510206
## metab_144 0.2362733668
## metab_145 -0.4474093473
## metab_146 -0.2478010006
## metab_147 0.1175383288
## metab_148 0.1270776122
## metab_149 -0.1266463449
## metab_150 0.1120017283
## metab_151 0.1174044898
## metab_152 -0.0680622988
## metab_153 -0.1549243474
## metab_154 0.0059712815
## metab_155 -0.1312489344
## metab_156 -0.1514464586
## metab_157 0.0879938991
## metab_158 0.1395521566
## metab_159 0.0991797155
## metab_160 -0.5309867132
## metab_161 1.2417573426
## metab_162 0.0587041479
## metab_163 0.6143989325
## metab_164 0.1112577644
## metab_165 -0.0184690014
## metab_166 -0.2298261668
## metab_167 -0.0117753978
## metab_168 -0.1315176028
## metab_169 -0.1046103335
## metab_170 -0.1341008606
## metab_171 -0.0106993433
## metab_172 -0.2587978537
## metab_173 0.4844551782
## metab_174 -0.2304842382
## metab_175 0.0602172097
## metab_176 0.1891704474
## metab_177 0.1613095211
ridge_predictions <- predict(ridge_model, s = best_lambda, newx = x_test, type = "response")
# convert probabilities to binary predictions
binary_predictions <- ifelse(ridge_predictions > 0.5, 1, 0)
# make sure levels match between binary_predictions and y_test
binary_predictions <- factor(binary_predictions, levels = c(0, 1))
y_test <- factor(y_test, levels = c(0, 1))
# accuracy accuracy
accuracy <- mean(binary_predictions == y_test)
cat("Ridge Accuracy on Test Set:", accuracy, "\n")
## Ridge Accuracy on Test Set: 0.7206704
conf_matrix <- confusionMatrix(binary_predictions, y_test)
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 123 44
## 1 56 135
##
## Accuracy : 0.7207
## 95% CI : (0.6711, 0.7665)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.4413
##
## Mcnemar's Test P-Value : 0.2713
##
## Sensitivity : 0.6872
## Specificity : 0.7542
## Pos Pred Value : 0.7365
## Neg Pred Value : 0.7068
## Prevalence : 0.5000
## Detection Rate : 0.3436
## Detection Prevalence : 0.4665
## Balanced Accuracy : 0.7207
##
## 'Positive' Class : 0
##
# ROC Curve and AUC
roc_curve <- roc(as.numeric(y_test), as.numeric(ridge_predictions))
## Setting levels: control = 1, case = 2
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Ridge Model")
auc_value <- auc(roc_curve)
cat("Ridge AUC on Test Set:", auc_value, "\n")
## Ridge AUC on Test Set: 0.8011922
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[ trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who ~ . - hs_zbmi_who_binary, train_data)[,-1]
y_train <- train_data$hs_zbmi_who
x_test <- model.matrix(hs_zbmi_who ~ . - hs_zbmi_who_binary, test_data)[,-1]
y_test <- test_data$hs_zbmi_who
enet_model <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "gaussian")
plot(enet_model)
enet_model$lambda.min
## [1] 0.01263519
coef(enet_model, s = enet_model$lambda.min)
## 240 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 8.578351892
## hs_child_age_None -0.115863248
## h_cohort2 -0.199517040
## h_cohort3 0.226486455
## h_cohort4 0.270706077
## h_cohort5 .
## h_cohort6 0.080649371
## e3_sex_Nonemale 0.331314265
## e3_yearbir_None2004 -0.103535525
## e3_yearbir_None2005 -0.039765996
## e3_yearbir_None2006 0.043243559
## e3_yearbir_None2007 0.068552528
## e3_yearbir_None2008 .
## e3_yearbir_None2009 0.445442699
## h_edumc_None2 .
## h_edumc_None3 0.054491500
## h_native_None1 .
## h_native_None2 0.034320469
## hs_cd_c_Log2 0.003067315
## hs_co_c_Log2 .
## hs_cs_c_Log2 0.107718759
## hs_cu_c_Log2 0.168073040
## hs_hg_c_Log2 -0.047465586
## hs_mo_c_Log2 -0.047744918
## hs_pb_c_Log2 .
## hs_dde_cadj_Log2 -0.020451748
## hs_pcb153_cadj_Log2 -0.219706322
## hs_pcb170_cadj_Log2 -0.035112266
## hs_dep_cadj_Log2 -0.011132999
## hs_pbde153_cadj_Log2 -0.016222997
## hs_pfhxs_c_Log2 .
## hs_pfoa_c_Log2 -0.032627881
## hs_pfos_c_Log2 0.008348682
## hs_prpa_cadj_Log2 -0.009537863
## hs_mbzp_cadj_Log2 0.053475790
## hs_mibp_cadj_Log2 .
## hs_mnbp_cadj_Log2 -0.002706282
## h_bfdur_Ter(10.8,34.9] 0.101043321
## h_bfdur_Ter(34.9,Inf] 0.124373658
## hs_bakery_prod_Ter(2,6] 0.018983276
## hs_bakery_prod_Ter(6,Inf] -0.087478595
## hs_dairy_Ter(14.6,25.6] -0.007705120
## hs_dairy_Ter(25.6,Inf] 0.069664607
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] -0.019016401
## hs_org_food_Ter(0.132,1] 0.039322506
## hs_org_food_Ter(1,Inf] 0.015758438
## hs_readymade_Ter(0.132,0.5] 0.078026127
## hs_readymade_Ter(0.5,Inf] 0.054721344
## hs_total_bread_Ter(7,17.5] .
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] -0.043448319
## hs_total_fruits_Ter(7,14.1] 0.080944909
## hs_total_fruits_Ter(14.1,Inf] 0.126550300
## hs_total_lipids_Ter(3,7] 0.060602634
## hs_total_lipids_Ter(7,Inf] .
## hs_total_potatoes_Ter(3,4] 0.009693939
## hs_total_potatoes_Ter(4,Inf] .
## hs_total_sweets_Ter(4.1,8.5] .
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] 0.026747318
## hs_total_veg_Ter(8.5,Inf] -0.006850275
## metab_1 -0.020808800
## metab_2 0.096683977
## metab_3 0.031072809
## metab_4 0.003171039
## metab_5 0.445974148
## metab_6 -0.105161707
## metab_7 0.005410952
## metab_8 0.234963593
## metab_9 .
## metab_10 0.028167489
## metab_11 0.179658243
## metab_12 -0.146803264
## metab_13 .
## metab_14 -0.443178624
## metab_15 .
## metab_16 .
## metab_17 -0.006140514
## metab_18 -0.171479325
## metab_19 .
## metab_20 .
## metab_21 0.047522305
## metab_22 -0.238392630
## metab_23 0.136406835
## metab_24 0.627204646
## metab_25 -0.126974841
## metab_26 -0.244950868
## metab_27 0.493623258
## metab_28 .
## metab_29 -0.022190986
## metab_30 0.147834978
## metab_31 0.051202465
## metab_32 -0.136067374
## metab_33 .
## metab_34 .
## metab_35 .
## metab_36 .
## metab_37 -0.042637486
## metab_38 -0.060170781
## metab_39 .
## metab_40 0.069886895
## metab_41 0.249082893
## metab_42 -0.414203858
## metab_43 -0.164870171
## metab_44 -0.052613203
## metab_45 0.126989129
## metab_46 .
## metab_47 0.451687523
## metab_48 -0.769383671
## metab_49 0.121626575
## metab_50 -0.195490856
## metab_51 .
## metab_52 0.437724646
## metab_53 .
## metab_54 0.117392944
## metab_55 .
## metab_56 -0.128140519
## metab_57 .
## metab_58 .
## metab_59 0.585488747
## metab_60 -0.145473912
## metab_61 .
## metab_62 .
## metab_63 -0.155176645
## metab_64 .
## metab_65 .
## metab_66 -0.079016208
## metab_67 -0.231657476
## metab_68 0.115684723
## metab_69 -0.053817763
## metab_70 .
## metab_71 -0.075636786
## metab_72 .
## metab_73 -0.113840395
## metab_74 .
## metab_75 0.291789027
## metab_76 .
## metab_77 0.012878526
## metab_78 -0.144180621
## metab_79 .
## metab_80 .
## metab_81 .
## metab_82 -0.663922686
## metab_83 .
## metab_84 -0.094860108
## metab_85 .
## metab_86 0.357294893
## metab_87 0.064179237
## metab_88 0.574755853
## metab_89 -1.143000128
## metab_90 .
## metab_91 0.135199218
## metab_92 0.109637762
## metab_93 .
## metab_94 -0.059295586
## metab_95 1.499983404
## metab_96 0.051363171
## metab_97 .
## metab_98 .
## metab_99 -0.433436320
## metab_100 0.577311913
## metab_101 .
## metab_102 .
## metab_103 -0.435522691
## metab_104 0.175872915
## metab_105 0.140581911
## metab_106 0.088994443
## metab_107 0.047991077
## metab_108 .
## metab_109 -0.258584717
## metab_110 -0.169475959
## metab_111 .
## metab_112 .
## metab_113 0.640532268
## metab_114 .
## metab_115 0.528199313
## metab_116 .
## metab_117 .
## metab_118 -0.339965015
## metab_119 .
## metab_120 -0.274743544
## metab_121 .
## metab_122 -0.073184707
## metab_123 .
## metab_124 .
## metab_125 -0.179193784
## metab_126 .
## metab_127 -0.024652725
## metab_128 -0.005445981
## metab_129 .
## metab_130 .
## metab_131 .
## metab_132 .
## metab_133 -0.320335848
## metab_134 0.327507471
## metab_135 -0.226372203
## metab_136 .
## metab_137 -0.393600744
## metab_138 -0.519594951
## metab_139 .
## metab_140 -0.016402886
## metab_141 .
## metab_142 -0.577761021
## metab_143 -0.268800773
## metab_144 0.005096758
## metab_145 -0.200534322
## metab_146 -0.001492012
## metab_147 0.516271413
## metab_148 .
## metab_149 .
## metab_150 0.273289288
## metab_151 -0.001296237
## metab_152 -0.045905617
## metab_153 .
## metab_154 -0.018929928
## metab_155 -0.333286274
## metab_156 .
## metab_157 0.180931535
## metab_158 .
## metab_159 0.137589540
## metab_160 -2.001321991
## metab_161 2.313776131
## metab_162 .
## metab_163 0.629853044
## metab_164 -0.090531303
## metab_165 .
## metab_166 -0.380179693
## metab_167 -0.088266549
## metab_168 -0.012943947
## metab_169 .
## metab_170 .
## metab_171 -0.076154014
## metab_172 .
## metab_173 0.061244506
## metab_174 .
## metab_175 -0.217032699
## metab_176 -0.049665193
## metab_177 0.132238749
predictions <- predict(enet_model, s = enet_model$lambda.min, newx = x_test)
test_mse <- mean((predictions - y_test)^2)
cat("Mean Squared Error on Test Set:", test_mse, "\n")
## Mean Squared Error on Test Set: 0.7364518
# convert hs_zbmi_who to binary based on median
median_value <- median(selected_metabolomics_data$hs_zbmi_who, na.rm = TRUE)
selected_metabolomics_data$hs_zbmi_who_binary <- ifelse(selected_metabolomics_data$hs_zbmi_who > median_value, 1, 0)
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who_binary, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, train_data)[,-1]
y_train <- train_data$hs_zbmi_who_binary
x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, test_data)[,-1]
y_test <- test_data$hs_zbmi_who_binary
# fit enet model using cross-validation
enet_model <- cv.glmnet(x_train, y_train, alpha = 0.5, family = "binomial")
plot(enet_model)
best_lambda <- enet_model$lambda.min
cat("Best Lambda:", best_lambda, "\n")
## Best Lambda: 0.01922375
coef(enet_model, s = best_lambda)
## 240 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) 2.4020477436
## hs_child_age_None -0.0560014051
## h_cohort2 -0.4073651020
## h_cohort3 .
## h_cohort4 .
## h_cohort5 .
## h_cohort6 .
## e3_sex_Nonemale 0.1634352085
## e3_yearbir_None2004 -0.3111975187
## e3_yearbir_None2005 -0.0274407810
## e3_yearbir_None2006 0.0547947452
## e3_yearbir_None2007 .
## e3_yearbir_None2008 .
## e3_yearbir_None2009 0.8681768833
## h_edumc_None2 0.0650518497
## h_edumc_None3 .
## h_native_None1 .
## h_native_None2 0.4274276556
## hs_cd_c_Log2 .
## hs_co_c_Log2 .
## hs_cs_c_Log2 0.0326280011
## hs_cu_c_Log2 .
## hs_hg_c_Log2 .
## hs_mo_c_Log2 -0.0342030845
## hs_pb_c_Log2 .
## hs_dde_cadj_Log2 -0.0034750012
## hs_pcb153_cadj_Log2 -0.2942329613
## hs_pcb170_cadj_Log2 -0.0222308611
## hs_dep_cadj_Log2 -0.0209386238
## hs_pbde153_cadj_Log2 -0.0449801939
## hs_pfhxs_c_Log2 .
## hs_pfoa_c_Log2 -0.2664887408
## hs_pfos_c_Log2 .
## hs_prpa_cadj_Log2 .
## hs_mbzp_cadj_Log2 0.0066356831
## hs_mibp_cadj_Log2 .
## hs_mnbp_cadj_Log2 -0.0023050433
## h_bfdur_Ter(10.8,34.9] 0.0988547127
## h_bfdur_Ter(34.9,Inf] 0.0147006452
## hs_bakery_prod_Ter(2,6] .
## hs_bakery_prod_Ter(6,Inf] -0.2924949144
## hs_dairy_Ter(14.6,25.6] .
## hs_dairy_Ter(25.6,Inf] .
## hs_fastfood_Ter(0.132,0.5] .
## hs_fastfood_Ter(0.5,Inf] .
## hs_org_food_Ter(0.132,1] 0.0487143467
## hs_org_food_Ter(1,Inf] 0.0774958323
## hs_readymade_Ter(0.132,0.5] .
## hs_readymade_Ter(0.5,Inf] .
## hs_total_bread_Ter(7,17.5] -0.0209828411
## hs_total_bread_Ter(17.5,Inf] .
## hs_total_fish_Ter(1.5,3] .
## hs_total_fish_Ter(3,Inf] .
## hs_total_fruits_Ter(7,14.1] .
## hs_total_fruits_Ter(14.1,Inf] 0.0394897294
## hs_total_lipids_Ter(3,7] .
## hs_total_lipids_Ter(7,Inf] .
## hs_total_potatoes_Ter(3,4] 0.0034319177
## hs_total_potatoes_Ter(4,Inf] -0.0753743752
## hs_total_sweets_Ter(4.1,8.5] .
## hs_total_sweets_Ter(8.5,Inf] .
## hs_total_veg_Ter(6,8.5] 0.1062968323
## hs_total_veg_Ter(8.5,Inf] -0.1091028878
## metab_1 .
## metab_2 .
## metab_3 .
## metab_4 0.1031376233
## metab_5 0.7634800567
## metab_6 .
## metab_7 .
## metab_8 0.2601092101
## metab_9 -0.0068450365
## metab_10 .
## metab_11 .
## metab_12 -0.0269087755
## metab_13 .
## metab_14 .
## metab_15 .
## metab_16 .
## metab_17 .
## metab_18 .
## metab_19 .
## metab_20 .
## metab_21 .
## metab_22 -0.0013508870
## metab_23 0.0904894843
## metab_24 0.2189482566
## metab_25 .
## metab_26 -0.4024123986
## metab_27 .
## metab_28 0.3841973716
## metab_29 .
## metab_30 0.0732898024
## metab_31 .
## metab_32 .
## metab_33 .
## metab_34 .
## metab_35 .
## metab_36 0.4249850652
## metab_37 -0.1096358525
## metab_38 -0.3348966019
## metab_39 .
## metab_40 .
## metab_41 .
## metab_42 .
## metab_43 .
## metab_44 .
## metab_45 .
## metab_46 -0.0922234432
## metab_47 0.4151670014
## metab_48 -0.8444050484
## metab_49 0.7696982049
## metab_50 -0.2442075003
## metab_51 .
## metab_52 .
## metab_53 .
## metab_54 0.0629050382
## metab_55 .
## metab_56 .
## metab_57 .
## metab_58 .
## metab_59 0.2754506887
## metab_60 .
## metab_61 .
## metab_62 .
## metab_63 .
## metab_64 .
## metab_65 0.0471572094
## metab_66 .
## metab_67 .
## metab_68 .
## metab_69 .
## metab_70 .
## metab_71 -0.3145978213
## metab_72 .
## metab_73 .
## metab_74 .
## metab_75 .
## metab_76 .
## metab_77 .
## metab_78 .
## metab_79 .
## metab_80 .
## metab_81 0.6542405688
## metab_82 -0.8624167458
## metab_83 .
## metab_84 .
## metab_85 .
## metab_86 .
## metab_87 .
## metab_88 .
## metab_89 .
## metab_90 .
## metab_91 .
## metab_92 0.0477387700
## metab_93 .
## metab_94 .
## metab_95 1.6775299871
## metab_96 .
## metab_97 .
## metab_98 .
## metab_99 .
## metab_100 .
## metab_101 .
## metab_102 .
## metab_103 .
## metab_104 0.3021326705
## metab_105 .
## metab_106 .
## metab_107 .
## metab_108 .
## metab_109 .
## metab_110 .
## metab_111 -0.0217730053
## metab_112 .
## metab_113 0.5047685046
## metab_114 .
## metab_115 0.2196103381
## metab_116 -0.2148789307
## metab_117 .
## metab_118 -0.6877284976
## metab_119 .
## metab_120 -0.1959035770
## metab_121 .
## metab_122 -0.6430105205
## metab_123 -0.3036600651
## metab_124 .
## metab_125 .
## metab_126 .
## metab_127 -0.1296679747
## metab_128 -0.0223137651
## metab_129 .
## metab_130 -0.0709113161
## metab_131 .
## metab_132 .
## metab_133 -0.5420970043
## metab_134 .
## metab_135 .
## metab_136 .
## metab_137 .
## metab_138 .
## metab_139 .
## metab_140 .
## metab_141 .
## metab_142 -0.1885139404
## metab_143 .
## metab_144 .
## metab_145 -0.5316793004
## metab_146 -0.1148885375
## metab_147 .
## metab_148 .
## metab_149 .
## metab_150 0.0160259618
## metab_151 0.0807523014
## metab_152 -0.0698930172
## metab_153 .
## metab_154 .
## metab_155 .
## metab_156 .
## metab_157 .
## metab_158 .
## metab_159 .
## metab_160 -1.7695413881
## metab_161 2.7371423123
## metab_162 .
## metab_163 0.3055854670
## metab_164 .
## metab_165 .
## metab_166 -0.0421310230
## metab_167 .
## metab_168 -0.0009165542
## metab_169 .
## metab_170 -0.1220026556
## metab_171 .
## metab_172 -0.3375446638
## metab_173 0.6102159262
## metab_174 .
## metab_175 .
## metab_176 .
## metab_177 0.1029920014
enet_predictions <- predict(enet_model, s = best_lambda, newx = x_test, type = "response")
# convert probabilities to binary predictions
binary_predictions <- ifelse(enet_predictions > 0.5, 1, 0)
# make sure levels match between binary_predictions and y_test
binary_predictions <- factor(binary_predictions, levels = c(0, 1))
y_test <- factor(y_test, levels = c(0, 1))
# accuracy accuracy
accuracy <- mean(binary_predictions == y_test)
cat("Ridge Accuracy on Test Set:", accuracy, "\n")
## Ridge Accuracy on Test Set: 0.7318436
conf_matrix <- confusionMatrix(binary_predictions, y_test)
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 131 48
## 1 48 131
##
## Accuracy : 0.7318
## 95% CI : (0.6828, 0.777)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.4637
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.7318
## Specificity : 0.7318
## Pos Pred Value : 0.7318
## Neg Pred Value : 0.7318
## Prevalence : 0.5000
## Detection Rate : 0.3659
## Detection Prevalence : 0.5000
## Balanced Accuracy : 0.7318
##
## 'Positive' Class : 0
##
# ROC Curve and AUC
roc_curve <- roc(as.numeric(y_test), as.numeric(enet_predictions))
## Setting levels: control = 1, case = 2
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Elastic Net Model")
auc_value <- auc(roc_curve)
cat("Elastic Net AUC on Test Set:", auc_value, "\n")
## Elastic Net AUC on Test Set: 0.8029088
set.seed(101)
rf_model <- randomForest(hs_zbmi_who ~ . -hs_zbmi_who_binary, data = train_data, ntree = 500)
rf_predictions <- predict(rf_model, newdata = test_data)
rf_mse <- mean((rf_predictions - y_test)^2)
cat("Random Forest Mean Squared Error on Test Set:", rf_mse, "\n")
## Random Forest Mean Squared Error on Test Set: NA
importance(rf_model)
## IncNodePurity
## hs_child_age_None 5.5074522
## h_cohort 19.4916857
## e3_sex_None 0.4492505
## e3_yearbir_None 5.8039370
## h_edumc_None 1.4385402
## h_native_None 1.7658986
## hs_cd_c_Log2 5.6183507
## hs_co_c_Log2 4.4794448
## hs_cs_c_Log2 4.8703163
## hs_cu_c_Log2 8.5678120
## hs_hg_c_Log2 6.4784728
## hs_mo_c_Log2 6.7818712
## hs_pb_c_Log2 4.6451124
## hs_dde_cadj_Log2 19.5653594
## hs_pcb153_cadj_Log2 17.0696101
## hs_pcb170_cadj_Log2 52.7773388
## hs_dep_cadj_Log2 6.7184329
## hs_pbde153_cadj_Log2 16.8722302
## hs_pfhxs_c_Log2 6.6002292
## hs_pfoa_c_Log2 13.1283651
## hs_pfos_c_Log2 8.8809389
## hs_prpa_cadj_Log2 5.6488812
## hs_mbzp_cadj_Log2 4.7983076
## hs_mibp_cadj_Log2 4.2370823
## hs_mnbp_cadj_Log2 4.9558276
## h_bfdur_Ter 1.2575134
## hs_bakery_prod_Ter 2.9254318
## hs_dairy_Ter 1.2260774
## hs_fastfood_Ter 0.9047281
## hs_org_food_Ter 1.5336253
## hs_readymade_Ter 1.3668650
## hs_total_bread_Ter 1.0428489
## hs_total_fish_Ter 1.3550813
## hs_total_fruits_Ter 1.0898909
## hs_total_lipids_Ter 1.0470290
## hs_total_potatoes_Ter 0.9123339
## hs_total_sweets_Ter 1.6372221
## hs_total_veg_Ter 1.4580138
## metab_1 4.0871578
## metab_2 4.2128177
## metab_3 4.7029776
## metab_4 5.8636216
## metab_5 4.5489668
## metab_6 9.8393989
## metab_7 4.3061556
## metab_8 36.1438350
## metab_9 3.1976746
## metab_10 3.6374170
## metab_11 3.7559461
## metab_12 2.4729509
## metab_13 3.0108598
## metab_14 3.8064562
## metab_15 3.2799798
## metab_16 2.2249745
## metab_17 2.9852186
## metab_18 1.9925296
## metab_19 2.7307790
## metab_20 3.7053660
## metab_21 2.1785723
## metab_22 1.9225963
## metab_23 2.0059957
## metab_24 2.3489537
## metab_25 3.9994896
## metab_26 6.8505449
## metab_27 2.5988484
## metab_28 3.8400569
## metab_29 3.1568753
## metab_30 19.8615408
## metab_31 3.9556422
## metab_32 2.8956486
## metab_33 5.0415088
## metab_34 1.9760372
## metab_35 6.6564642
## metab_36 3.1010204
## metab_37 2.6569784
## metab_38 2.4040112
## metab_39 2.4260736
## metab_40 3.8362923
## metab_41 3.6742931
## metab_42 2.3948847
## metab_43 3.0761983
## metab_44 2.8462257
## metab_45 3.4434926
## metab_46 4.9750622
## metab_47 7.6679729
## metab_48 11.7097595
## metab_49 53.5181385
## metab_50 9.2778981
## metab_51 4.7769057
## metab_52 5.9956658
## metab_53 5.0377482
## metab_54 5.7990430
## metab_55 4.5033979
## metab_56 4.2665208
## metab_57 4.5263447
## metab_58 2.8768517
## metab_59 6.8572432
## metab_60 4.5240185
## metab_61 3.0325074
## metab_62 3.7119392
## metab_63 3.4153570
## metab_64 4.1487240
## metab_65 3.1945214
## metab_66 2.8112278
## metab_67 3.2072152
## metab_68 3.9196796
## metab_69 2.7667018
## metab_70 3.6862662
## metab_71 3.5275714
## metab_72 3.1849869
## metab_73 3.2050063
## metab_74 2.6949838
## metab_75 3.6272659
## metab_76 2.9534465
## metab_77 4.7055225
## metab_78 5.4798728
## metab_79 3.8960521
## metab_80 3.6397873
## metab_81 3.0255419
## metab_82 4.2731811
## metab_83 3.1672685
## metab_84 3.3761037
## metab_85 7.1480799
## metab_86 3.0342445
## metab_87 3.1990462
## metab_88 3.1502847
## metab_89 2.0640603
## metab_90 3.1518544
## metab_91 3.8310605
## metab_92 2.9746674
## metab_93 2.6820865
## metab_94 9.3598587
## metab_95 52.5632971
## metab_96 13.1494768
## metab_97 3.0896610
## metab_98 3.7069636
## metab_99 5.7433224
## metab_100 3.1887943
## metab_101 3.1908409
## metab_102 5.2689107
## metab_103 2.5897429
## metab_104 4.2242263
## metab_105 3.8655538
## metab_106 3.0282458
## metab_107 3.3185893
## metab_108 3.4897988
## metab_109 4.4365747
## metab_110 4.9830908
## metab_111 3.1476995
## metab_112 2.8362703
## metab_113 4.4385828
## metab_114 3.4837432
## metab_115 4.7996199
## metab_116 6.6046162
## metab_117 5.1747420
## metab_118 3.3627455
## metab_119 5.8751976
## metab_120 7.6859256
## metab_121 4.4947484
## metab_122 6.7219227
## metab_123 3.5101852
## metab_124 4.0121806
## metab_125 2.9179718
## metab_126 2.5829507
## metab_127 5.0579414
## metab_128 8.5064765
## metab_129 3.4979494
## metab_130 4.0003071
## metab_131 2.5970924
## metab_132 2.7297049
## metab_133 2.7821347
## metab_134 3.8032324
## metab_135 4.8982450
## metab_136 5.3801748
## metab_137 5.1391792
## metab_138 5.2830234
## metab_139 3.8560108
## metab_140 2.7863988
## metab_141 10.6921849
## metab_142 10.6114033
## metab_143 11.6015673
## metab_144 3.0581726
## metab_145 5.4116243
## metab_146 6.6799793
## metab_147 4.3133137
## metab_148 3.9434950
## metab_149 4.4806132
## metab_150 4.3264355
## metab_151 3.4339273
## metab_152 5.9309425
## metab_153 6.7072909
## metab_154 6.2609053
## metab_155 2.5536669
## metab_156 2.5273062
## metab_157 3.3192833
## metab_158 2.6606954
## metab_159 2.8835429
## metab_160 7.2901430
## metab_161 16.0767528
## metab_162 3.5712194
## metab_163 8.9745145
## metab_164 7.0366866
## metab_165 3.5349826
## metab_166 2.8503588
## metab_167 3.1266057
## metab_168 3.2890921
## metab_169 3.9190006
## metab_170 4.7075326
## metab_171 6.9721360
## metab_172 4.6671661
## metab_173 5.1196117
## metab_174 3.4247350
## metab_175 4.9665905
## metab_176 6.1612095
## metab_177 8.3370310
varImpPlot(rf_model)
# ROC Curve and AUC
roc_curve <- roc(as.numeric(as.character(y_test)), as.numeric(as.character(rf_predictions)))
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Random Forest Model")
auc_value <- auc(roc_curve)
cat("Random Forest AUC on Test Set:", auc_value, "\n")
## Random Forest AUC on Test Set: 0.7725726
selected_metabolomics_data <- selected_metabolomics_data %>% na.omit()
# hs_zbmi_who to binary based on median
median_value <- median(selected_metabolomics_data$hs_zbmi_who, na.rm = TRUE)
selected_metabolomics_data$hs_zbmi_who_binary <- ifelse(selected_metabolomics_data$hs_zbmi_who > median_value, 1, 0)
selected_metabolomics_data$hs_zbmi_who_binary <- factor(selected_metabolomics_data$hs_zbmi_who_binary, levels = c(0, 1), labels = c("0", "1"))
set.seed(101)
trainIndex <- createDataPartition(selected_metabolomics_data$hs_zbmi_who_binary, p = .7,
list = FALSE,
times = 1)
train_data <- selected_metabolomics_data[trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
x_train <- model.matrix(hs_zbmi_who_binary ~ . , train_data)[,-1]
y_train <- train_data$hs_zbmi_who_binary
x_test <- model.matrix(hs_zbmi_who_binary ~ . , test_data)[,-1]
y_test <- test_data$hs_zbmi_who_binary
set.seed(101)
rf_model <- randomForest(hs_zbmi_who_binary ~ . -hs_zbmi_who, data = train_data, ntree = 500)
rf_predictions_prob <- predict(rf_model, newdata = test_data, type = "prob")[,2]
rf_predictions <- predict(rf_model, newdata = test_data)
rf_mse <- mean((as.numeric(as.character(rf_predictions)) - as.numeric(as.character(y_test)))^2)
cat("Random Forest Mean Squared Error on Test Set:", rf_mse, "\n")
## Random Forest Mean Squared Error on Test Set: 0.3072626
importance(rf_model)
## MeanDecreaseGini
## hs_child_age_None 1.8523293
## h_cohort 3.3479617
## e3_sex_None 0.1040747
## e3_yearbir_None 1.3693376
## h_edumc_None 0.4007728
## h_native_None 0.5716096
## hs_cd_c_Log2 2.0735471
## hs_co_c_Log2 1.6621328
## hs_cs_c_Log2 2.1745413
## hs_cu_c_Log2 2.1906329
## hs_hg_c_Log2 2.0161000
## hs_mo_c_Log2 2.1869577
## hs_pb_c_Log2 1.9878770
## hs_dde_cadj_Log2 3.9288392
## hs_pcb153_cadj_Log2 5.3039284
## hs_pcb170_cadj_Log2 8.7092125
## hs_dep_cadj_Log2 2.4571462
## hs_pbde153_cadj_Log2 4.7637945
## hs_pfhxs_c_Log2 2.5413269
## hs_pfoa_c_Log2 4.1792489
## hs_pfos_c_Log2 3.0811819
## hs_prpa_cadj_Log2 2.0536045
## hs_mbzp_cadj_Log2 1.8033959
## hs_mibp_cadj_Log2 1.7484547
## hs_mnbp_cadj_Log2 1.7348190
## h_bfdur_Ter 0.6339703
## hs_bakery_prod_Ter 0.5878442
## hs_dairy_Ter 0.2434327
## hs_fastfood_Ter 0.1845531
## hs_org_food_Ter 0.2072968
## hs_readymade_Ter 0.3005943
## hs_total_bread_Ter 0.3357945
## hs_total_fish_Ter 0.4122144
## hs_total_fruits_Ter 0.2736450
## hs_total_lipids_Ter 0.3954569
## hs_total_potatoes_Ter 0.3891347
## hs_total_sweets_Ter 0.1939762
## hs_total_veg_Ter 0.5094748
## metab_1 1.7109996
## metab_2 2.1424708
## metab_3 2.1151758
## metab_4 3.5424401
## metab_5 2.1797806
## metab_6 2.2480108
## metab_7 2.2489912
## metab_8 4.2774417
## metab_9 1.7610228
## metab_10 1.7685733
## metab_11 1.4575901
## metab_12 1.6536111
## metab_13 1.4001754
## metab_14 1.6016016
## metab_15 1.2990389
## metab_16 1.7940074
## metab_17 1.0069289
## metab_18 1.3688455
## metab_19 1.3142545
## metab_20 1.6738605
## metab_21 1.3146701
## metab_22 1.1629587
## metab_23 1.2397936
## metab_24 1.5184173
## metab_25 1.3403026
## metab_26 1.8571963
## metab_27 1.6379500
## metab_28 2.7852755
## metab_29 1.7570262
## metab_30 3.5977656
## metab_31 1.7964019
## metab_32 1.4625578
## metab_33 1.6468947
## metab_34 1.0247106
## metab_35 1.8056123
## metab_36 1.7166192
## metab_37 1.1014846
## metab_38 1.1496938
## metab_39 1.1905605
## metab_40 1.3512653
## metab_41 1.5383893
## metab_42 1.1824578
## metab_43 1.6879711
## metab_44 1.5111428
## metab_45 1.8191252
## metab_46 1.6115893
## metab_47 2.9455712
## metab_48 2.7233195
## metab_49 9.0757620
## metab_50 2.1644236
## metab_51 2.1874906
## metab_52 1.9432381
## metab_53 2.9044352
## metab_54 2.9120093
## metab_55 2.3686913
## metab_56 2.2252842
## metab_57 1.7873563
## metab_58 1.4269410
## metab_59 2.6388253
## metab_60 1.7717135
## metab_61 1.7858749
## metab_62 1.7418597
## metab_63 1.7856215
## metab_64 1.5896259
## metab_65 2.0104752
## metab_66 1.5850147
## metab_67 1.7720968
## metab_68 1.6081484
## metab_69 1.4655227
## metab_70 1.6966824
## metab_71 1.8969630
## metab_72 1.8656166
## metab_73 1.8975098
## metab_74 1.3328979
## metab_75 1.7640205
## metab_76 2.0179031
## metab_77 1.6029826
## metab_78 2.2538563
## metab_79 1.9218813
## metab_80 2.0646595
## metab_81 1.6624515
## metab_82 2.1874769
## metab_83 1.7090610
## metab_84 1.6666232
## metab_85 2.0956107
## metab_86 1.6292567
## metab_87 1.7897006
## metab_88 1.4322982
## metab_89 1.4487496
## metab_90 1.7324393
## metab_91 1.6481035
## metab_92 1.7081247
## metab_93 1.3216934
## metab_94 2.4265592
## metab_95 7.3691396
## metab_96 4.6046467
## metab_97 1.5680555
## metab_98 1.7334903
## metab_99 1.9842036
## metab_100 1.4593013
## metab_101 1.4962482
## metab_102 2.9286482
## metab_103 1.7995811
## metab_104 2.0042121
## metab_105 1.5233338
## metab_106 1.7532421
## metab_107 1.7044256
## metab_108 1.5355808
## metab_109 1.8251939
## metab_110 2.3320165
## metab_111 2.0128538
## metab_112 1.7377734
## metab_113 2.3712733
## metab_114 1.6990983
## metab_115 1.4007781
## metab_116 2.6324561
## metab_117 2.0493509
## metab_118 1.9590967
## metab_119 1.6945789
## metab_120 2.1263983
## metab_121 1.5200319
## metab_122 2.7128775
## metab_123 1.9272855
## metab_124 1.8747739
## metab_125 1.4552243
## metab_126 1.7179112
## metab_127 2.4613674
## metab_128 1.6515006
## metab_129 1.5716569
## metab_130 1.8222297
## metab_131 1.3981787
## metab_132 1.2985384
## metab_133 1.9020675
## metab_134 1.7976656
## metab_135 1.4393536
## metab_136 1.5948977
## metab_137 1.5721024
## metab_138 1.8372330
## metab_139 1.3633590
## metab_140 1.2936696
## metab_141 2.3115194
## metab_142 2.2707234
## metab_143 1.6900013
## metab_144 1.5823535
## metab_145 2.2555999
## metab_146 2.2061952
## metab_147 1.4162967
## metab_148 1.6483307
## metab_149 2.2464976
## metab_150 1.7813079
## metab_151 2.0734559
## metab_152 1.9623828
## metab_153 1.7961992
## metab_154 2.2859253
## metab_155 1.6463759
## metab_156 1.5660346
## metab_157 1.9665845
## metab_158 1.3425696
## metab_159 1.7602764
## metab_160 1.8770333
## metab_161 5.1306727
## metab_162 2.0290646
## metab_163 3.6280902
## metab_164 2.5887911
## metab_165 2.0201035
## metab_166 1.5318234
## metab_167 1.8498371
## metab_168 1.8635521
## metab_169 1.8239681
## metab_170 1.9615842
## metab_171 2.5313165
## metab_172 1.8581499
## metab_173 1.7592050
## metab_174 1.8164449
## metab_175 2.1994183
## metab_176 3.2170862
## metab_177 4.0289999
varImpPlot(rf_model)
# ROC Curve and AUC
roc_curve <- roc(as.numeric(as.character(y_test)), as.numeric(as.character(rf_predictions_prob)))
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Random Forest Model")
auc_value <- auc(roc_curve)
cat("Random Forest AUC on Test Set:", auc_value, "\n")
## Random Forest AUC on Test Set: 0.7508193
conf_matrix <- confusionMatrix(rf_predictions, y_test)
print(conf_matrix)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 118 49
## 1 61 130
##
## Accuracy : 0.6927
## 95% CI : (0.6421, 0.7402)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 1.131e-13
##
## Kappa : 0.3855
##
## Mcnemar's Test P-Value : 0.2943
##
## Sensitivity : 0.6592
## Specificity : 0.7263
## Pos Pred Value : 0.7066
## Neg Pred Value : 0.6806
## Prevalence : 0.5000
## Detection Rate : 0.3296
## Detection Prevalence : 0.4665
## Balanced Accuracy : 0.6927
##
## 'Positive' Class : 0
##
set.seed(101)
gbm_model <- gbm(hs_zbmi_who ~ . - hs_zbmi_who_binary, data = train_data,
distribution = "gaussian",
n.trees = 1000,
interaction.depth = 3,
n.minobsinnode = 10,
shrinkage = 0.01,
cv.folds = 5,
verbose = TRUE)
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.4345 nan 0.0100 0.0030
## 2 1.4303 nan 0.0100 0.0026
## 3 1.4256 nan 0.0100 0.0034
## 4 1.4212 nan 0.0100 0.0030
## 5 1.4167 nan 0.0100 0.0023
## 6 1.4125 nan 0.0100 0.0025
## 7 1.4080 nan 0.0100 0.0036
## 8 1.4038 nan 0.0100 0.0028
## 9 1.3986 nan 0.0100 0.0040
## 10 1.3936 nan 0.0100 0.0030
## 20 1.3502 nan 0.0100 0.0038
## 40 1.2714 nan 0.0100 0.0025
## 60 1.2058 nan 0.0100 0.0014
## 80 1.1486 nan 0.0100 0.0021
## 100 1.0992 nan 0.0100 0.0010
## 120 1.0544 nan 0.0100 0.0005
## 140 1.0151 nan 0.0100 0.0006
## 160 0.9758 nan 0.0100 0.0008
## 180 0.9415 nan 0.0100 0.0004
## 200 0.9107 nan 0.0100 0.0004
## 220 0.8806 nan 0.0100 0.0001
## 240 0.8517 nan 0.0100 0.0010
## 260 0.8253 nan 0.0100 0.0000
## 280 0.8012 nan 0.0100 0.0007
## 300 0.7783 nan 0.0100 -0.0001
## 320 0.7566 nan 0.0100 0.0004
## 340 0.7376 nan 0.0100 0.0001
## 360 0.7170 nan 0.0100 0.0002
## 380 0.6984 nan 0.0100 -0.0005
## 400 0.6823 nan 0.0100 0.0005
## 420 0.6650 nan 0.0100 0.0000
## 440 0.6492 nan 0.0100 0.0001
## 460 0.6339 nan 0.0100 0.0003
## 480 0.6201 nan 0.0100 0.0000
## 500 0.6070 nan 0.0100 -0.0001
## 520 0.5947 nan 0.0100 0.0003
## 540 0.5821 nan 0.0100 -0.0002
## 560 0.5698 nan 0.0100 0.0002
## 580 0.5582 nan 0.0100 0.0001
## 600 0.5472 nan 0.0100 -0.0000
## 620 0.5361 nan 0.0100 0.0000
## 640 0.5259 nan 0.0100 -0.0001
## 660 0.5166 nan 0.0100 -0.0001
## 680 0.5066 nan 0.0100 0.0001
## 700 0.4975 nan 0.0100 -0.0002
## 720 0.4888 nan 0.0100 0.0003
## 740 0.4798 nan 0.0100 -0.0000
## 760 0.4710 nan 0.0100 -0.0000
## 780 0.4626 nan 0.0100 -0.0002
## 800 0.4546 nan 0.0100 -0.0001
## 820 0.4469 nan 0.0100 -0.0002
## 840 0.4394 nan 0.0100 -0.0001
## 860 0.4322 nan 0.0100 -0.0001
## 880 0.4251 nan 0.0100 -0.0000
## 900 0.4184 nan 0.0100 0.0001
## 920 0.4118 nan 0.0100 -0.0002
## 940 0.4050 nan 0.0100 -0.0001
## 960 0.3981 nan 0.0100 -0.0002
## 980 0.3916 nan 0.0100 -0.0001
## 1000 0.3858 nan 0.0100 -0.0000
best_trees <- gbm.perf(gbm_model, method = "cv")
gbm_predictions <- predict(gbm_model, newdata = test_data, n.trees = best_trees)
gbm_mse <- mean((gbm_predictions - y_test)^2)
cat("GBM Mean Squared Error on Test Set:", gbm_mse, "\n")
## GBM Mean Squared Error on Test Set: NA
gbm_importance <- summary(gbm_model)
print(gbm_importance)
## var rel.inf
## hs_pcb170_cadj_Log2 hs_pcb170_cadj_Log2 7.17452557
## metab_95 metab_95 7.01343903
## metab_49 metab_49 6.29708960
## metab_8 metab_8 3.95379904
## metab_161 metab_161 3.13067643
## h_cohort h_cohort 2.85498639
## metab_163 metab_163 2.54063347
## hs_pbde153_cadj_Log2 hs_pbde153_cadj_Log2 2.20596365
## metab_26 metab_26 2.16772212
## hs_dde_cadj_Log2 hs_dde_cadj_Log2 2.11660555
## metab_177 metab_177 1.84792341
## metab_143 metab_143 1.79549171
## metab_48 metab_48 1.75233342
## metab_30 metab_30 1.67215010
## hs_pfoa_c_Log2 hs_pfoa_c_Log2 1.65769679
## hs_cu_c_Log2 hs_cu_c_Log2 1.55529425
## metab_142 metab_142 1.50928543
## metab_47 metab_47 1.46863592
## metab_120 metab_120 1.46391164
## metab_160 metab_160 1.43134372
## metab_141 metab_141 1.28844230
## metab_59 metab_59 1.27953690
## metab_171 metab_171 1.23101024
## metab_6 metab_6 1.17712040
## metab_50 metab_50 1.14085803
## metab_94 metab_94 1.03652607
## metab_115 metab_115 1.01127946
## metab_96 metab_96 0.99458189
## metab_154 metab_154 0.96051941
## hs_pfos_c_Log2 hs_pfos_c_Log2 0.86671736
## metab_122 metab_122 0.84234437
## metab_146 metab_146 0.79612293
## metab_78 metab_78 0.72108650
## metab_128 metab_128 0.71803369
## metab_110 metab_110 0.71420291
## hs_hg_c_Log2 hs_hg_c_Log2 0.71177199
## hs_bakery_prod_Ter hs_bakery_prod_Ter 0.69131177
## metab_153 metab_153 0.67376194
## metab_117 metab_117 0.61954855
## metab_172 metab_172 0.60951813
## metab_91 metab_91 0.60873206
## metab_130 metab_130 0.59565372
## hs_pcb153_cadj_Log2 hs_pcb153_cadj_Log2 0.56883396
## metab_68 metab_68 0.53475617
## hs_child_age_None hs_child_age_None 0.49832122
## metab_116 metab_116 0.49065463
## metab_136 metab_136 0.49046402
## metab_113 metab_113 0.48807115
## metab_162 metab_162 0.44911200
## metab_57 metab_57 0.44576546
## metab_99 metab_99 0.44345569
## hs_pfhxs_c_Log2 hs_pfhxs_c_Log2 0.43693720
## metab_55 metab_55 0.42357257
## hs_co_c_Log2 hs_co_c_Log2 0.40932149
## hs_mo_c_Log2 hs_mo_c_Log2 0.40713040
## metab_75 metab_75 0.40495809
## metab_145 metab_145 0.40004711
## metab_104 metab_104 0.38534591
## metab_3 metab_3 0.38469627
## metab_28 metab_28 0.36650478
## metab_27 metab_27 0.34265574
## metab_54 metab_54 0.32862271
## metab_7 metab_7 0.32064813
## metab_176 metab_176 0.30550173
## metab_109 metab_109 0.29742323
## metab_119 metab_119 0.29553128
## e3_sex_None e3_sex_None 0.29515840
## metab_82 metab_82 0.29161440
## metab_43 metab_43 0.28997110
## hs_pb_c_Log2 hs_pb_c_Log2 0.27731432
## metab_149 metab_149 0.27206310
## metab_85 metab_85 0.26465783
## metab_77 metab_77 0.26342051
## metab_15 metab_15 0.26152983
## metab_105 metab_105 0.25676464
## hs_mbzp_cadj_Log2 hs_mbzp_cadj_Log2 0.25337587
## metab_102 metab_102 0.24703000
## metab_20 metab_20 0.24434219
## metab_81 metab_81 0.23967894
## metab_152 metab_152 0.23833237
## metab_39 metab_39 0.23817782
## metab_2 metab_2 0.23556811
## metab_71 metab_71 0.23553088
## h_bfdur_Ter h_bfdur_Ter 0.23440942
## metab_127 metab_127 0.23353942
## metab_53 metab_53 0.21051382
## metab_100 metab_100 0.19692758
## metab_51 metab_51 0.19375771
## metab_12 metab_12 0.19315162
## metab_29 metab_29 0.18782695
## metab_44 metab_44 0.18591972
## metab_56 metab_56 0.18159051
## metab_151 metab_151 0.18050106
## metab_70 metab_70 0.17974570
## metab_33 metab_33 0.17844071
## metab_144 metab_144 0.17736775
## metab_76 metab_76 0.17525980
## metab_90 metab_90 0.17370247
## metab_148 metab_148 0.16970555
## metab_67 metab_67 0.16965371
## metab_24 metab_24 0.16822606
## metab_174 metab_174 0.16308422
## metab_147 metab_147 0.15895027
## metab_22 metab_22 0.15292928
## metab_133 metab_133 0.15211469
## metab_40 metab_40 0.15157112
## metab_92 metab_92 0.15074862
## metab_123 metab_123 0.14888646
## metab_165 metab_165 0.14807464
## metab_164 metab_164 0.14731732
## metab_155 metab_155 0.14499331
## e3_yearbir_None e3_yearbir_None 0.14418236
## metab_83 metab_83 0.14172014
## metab_31 metab_31 0.13709250
## metab_137 metab_137 0.13339110
## metab_118 metab_118 0.13234372
## metab_10 metab_10 0.13230366
## metab_121 metab_121 0.12935547
## metab_11 metab_11 0.12858949
## metab_73 metab_73 0.12689016
## metab_35 metab_35 0.12243171
## metab_4 metab_4 0.11884420
## metab_135 metab_135 0.11534955
## metab_129 metab_129 0.11186107
## metab_124 metab_124 0.11146572
## h_edumc_None h_edumc_None 0.11049115
## metab_5 metab_5 0.10784973
## metab_131 metab_131 0.10516928
## metab_13 metab_13 0.10255427
## metab_37 metab_37 0.10180845
## metab_156 metab_156 0.09890787
## hs_cd_c_Log2 hs_cd_c_Log2 0.09870967
## metab_173 metab_173 0.09766258
## metab_106 metab_106 0.09743478
## metab_65 metab_65 0.09540889
## metab_61 metab_61 0.09521138
## metab_170 metab_170 0.09392339
## metab_97 metab_97 0.09338812
## metab_88 metab_88 0.09334622
## metab_157 metab_157 0.09152121
## metab_64 metab_64 0.08842711
## metab_38 metab_38 0.08777233
## metab_23 metab_23 0.08689099
## metab_60 metab_60 0.08353259
## metab_41 metab_41 0.08305904
## metab_159 metab_159 0.08300417
## metab_93 metab_93 0.07797704
## metab_111 metab_111 0.07359432
## metab_139 metab_139 0.07306725
## metab_52 metab_52 0.07292251
## metab_103 metab_103 0.07158258
## metab_25 metab_25 0.06911508
## hs_cs_c_Log2 hs_cs_c_Log2 0.06809545
## metab_167 metab_167 0.06693830
## metab_89 metab_89 0.06352953
## metab_63 metab_63 0.06137982
## hs_prpa_cadj_Log2 hs_prpa_cadj_Log2 0.06058699
## metab_69 metab_69 0.05713214
## hs_mnbp_cadj_Log2 hs_mnbp_cadj_Log2 0.05638747
## metab_79 metab_79 0.05580534
## metab_14 metab_14 0.05475433
## metab_45 metab_45 0.05354158
## metab_46 metab_46 0.05327852
## metab_98 metab_98 0.05218432
## h_native_None h_native_None 0.05193607
## hs_total_lipids_Ter hs_total_lipids_Ter 0.05114275
## metab_72 metab_72 0.04900319
## metab_166 metab_166 0.04674223
## metab_108 metab_108 0.04604288
## metab_101 metab_101 0.04502078
## metab_132 metab_132 0.04296251
## metab_175 metab_175 0.03690880
## metab_168 metab_168 0.03521842
## metab_34 metab_34 0.03232148
## metab_169 metab_169 0.03231200
## metab_138 metab_138 0.03048418
## metab_114 metab_114 0.02910522
## metab_21 metab_21 0.02783841
## metab_66 metab_66 0.02621212
## metab_62 metab_62 0.02598841
## metab_84 metab_84 0.02582321
## metab_1 metab_1 0.02509358
## metab_134 metab_134 0.02420686
## metab_80 metab_80 0.02382858
## metab_19 metab_19 0.02061960
## metab_150 metab_150 0.02042669
## hs_org_food_Ter hs_org_food_Ter 0.02023934
## metab_36 metab_36 0.01926574
## metab_9 metab_9 0.01803771
## metab_32 metab_32 0.01711978
## metab_112 metab_112 0.01698640
## metab_107 metab_107 0.01680681
## hs_dep_cadj_Log2 hs_dep_cadj_Log2 0.01596550
## metab_140 metab_140 0.01557540
## metab_87 metab_87 0.01527845
## metab_16 metab_16 0.01442643
## metab_86 metab_86 0.01413670
## hs_total_sweets_Ter hs_total_sweets_Ter 0.01319851
## hs_mibp_cadj_Log2 hs_mibp_cadj_Log2 0.00000000
## hs_dairy_Ter hs_dairy_Ter 0.00000000
## hs_fastfood_Ter hs_fastfood_Ter 0.00000000
## hs_readymade_Ter hs_readymade_Ter 0.00000000
## hs_total_bread_Ter hs_total_bread_Ter 0.00000000
## hs_total_fish_Ter hs_total_fish_Ter 0.00000000
## hs_total_fruits_Ter hs_total_fruits_Ter 0.00000000
## hs_total_potatoes_Ter hs_total_potatoes_Ter 0.00000000
## hs_total_veg_Ter hs_total_veg_Ter 0.00000000
## metab_17 metab_17 0.00000000
## metab_18 metab_18 0.00000000
## metab_42 metab_42 0.00000000
## metab_58 metab_58 0.00000000
## metab_74 metab_74 0.00000000
## metab_125 metab_125 0.00000000
## metab_126 metab_126 0.00000000
## metab_158 metab_158 0.00000000
selected_metabolomics_data <- selected_metabolomics_data %>% na.omit()
median_value <- median(selected_metabolomics_data$hs_zbmi_who, na.rm = TRUE)
selected_metabolomics_data$hs_zbmi_who_binary <- ifelse(selected_metabolomics_data$hs_zbmi_who > median_value, 1, 0)
set.seed(101)
trainIndex <- caret::createDataPartition(selected_metabolomics_data$hs_zbmi_who_binary, p = .7, list = FALSE, times = 1)
train_data <- selected_metabolomics_data[trainIndex,]
test_data <- selected_metabolomics_data[-trainIndex,]
train_data_clean <- train_data[complete.cases(train_data), ]
test_data_clean <- test_data[complete.cases(test_data), ]
x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = train_data_clean)[, -1]
y_train <- as.numeric(train_data_clean$hs_zbmi_who_binary)
x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = test_data_clean)[, -1]
y_test <- as.numeric(test_data_clean$hs_zbmi_who_binary)
num_chemicals <- length(chemicals_selected)
num_diet <- length(diet_selected)
num_metabolomics <- ncol(metabol_serum_transposed) - 1 # Excluding ID
num_covariates <- ncol(outcome_and_cov) - 3 # Excluding ID and outcome
# Combine all the lengths
total_length <- num_chemicals + num_diet + num_metabolomics + num_covariates
cat("Total length of predictors:", total_length, "\n")
## Total length of predictors: 216
cat("Number of predictors in x_train:", ncol(x_train), "\n")
## Number of predictors in x_train: 239
group_indices <- c(
rep(1, num_chemicals), # Group 1: Chemicals
rep(2, num_diet), # Group 2: Postnatal diet
rep(3, num_metabolomics), # Group 3: Metabolomics (excluding ID)
rep(4, num_covariates) # Group 4: Covariates (excluding ID and outcome)
)
if (length(group_indices) < ncol(x_train)) {
group_indices <- c(group_indices, rep(5, ncol(x_train) - length(group_indices)))
} else if (length(group_indices) > ncol(x_train)) {
group_indices <- group_indices[1:ncol(x_train)]
}
cat("Length of group_indices:", length(group_indices), "\n")
## Length of group_indices: 239
cat("Number of columns in x_train:", ncol(x_train), "\n")
## Number of columns in x_train: 239
group_lasso_model <- grplasso(x_train, y_train, index = group_indices, lambda = 0.1, model = LogReg())
## Couldn't find intercept. Setting center = FALSE.
## Lambda: 0.1 nr.var: 239
coef(group_lasso_model)
## 0.1
## hs_child_age_None -0.814555750
## h_cohort2 3.234390126
## h_cohort3 4.206670924
## h_cohort4 1.776110355
## h_cohort5 3.535594058
## h_cohort6 0.815805203
## e3_sex_Nonemale 0.409834794
## e3_yearbir_None2004 -0.343737604
## e3_yearbir_None2005 1.361351042
## e3_yearbir_None2006 1.252469223
## e3_yearbir_None2007 4.146651290
## e3_yearbir_None2008 3.728110763
## e3_yearbir_None2009 6.196278675
## h_edumc_None2 0.907073602
## h_edumc_None3 0.616745843
## h_native_None1 1.857354603
## h_native_None2 1.201650549
## hs_cd_c_Log2 -0.028984085
## hs_co_c_Log2 0.410669666
## hs_cs_c_Log2 0.354839370
## hs_cu_c_Log2 0.082330485
## hs_hg_c_Log2 0.016901788
## hs_mo_c_Log2 -0.311599261
## hs_pb_c_Log2 0.508267163
## hs_dde_cadj_Log2 0.016035915
## hs_pcb153_cadj_Log2 -0.768239406
## hs_pcb170_cadj_Log2 -0.038885008
## hs_dep_cadj_Log2 -0.098028508
## hs_pbde153_cadj_Log2 -0.126021425
## hs_pfhxs_c_Log2 0.205519000
## hs_pfoa_c_Log2 -0.961203142
## hs_pfos_c_Log2 0.249979788
## hs_prpa_cadj_Log2 -0.025727699
## hs_mbzp_cadj_Log2 0.131623580
## hs_mibp_cadj_Log2 -0.034664908
## hs_mnbp_cadj_Log2 -0.150898534
## h_bfdur_Ter(10.8,34.9] 0.607301430
## h_bfdur_Ter(34.9,Inf] 0.897256516
## hs_bakery_prod_Ter(2,6] -0.543308422
## hs_bakery_prod_Ter(6,Inf] -0.675342550
## hs_dairy_Ter(14.6,25.6] 0.167430517
## hs_dairy_Ter(25.6,Inf] 0.433876848
## hs_fastfood_Ter(0.132,0.5] -0.610485396
## hs_fastfood_Ter(0.5,Inf] -0.506487333
## hs_org_food_Ter(0.132,1] 0.571040806
## hs_org_food_Ter(1,Inf] 0.564690937
## hs_readymade_Ter(0.132,0.5] -0.176643475
## hs_readymade_Ter(0.5,Inf] -0.066547593
## hs_total_bread_Ter(7,17.5] -0.888713877
## hs_total_bread_Ter(17.5,Inf] -0.348798406
## hs_total_fish_Ter(1.5,3] 0.111383056
## hs_total_fish_Ter(3,Inf] 0.154284858
## hs_total_fruits_Ter(7,14.1] 0.172802450
## hs_total_fruits_Ter(14.1,Inf] 0.903091073
## hs_total_lipids_Ter(3,7] 0.625145566
## hs_total_lipids_Ter(7,Inf] 0.807497857
## hs_total_potatoes_Ter(3,4] 0.057915791
## hs_total_potatoes_Ter(4,Inf] -0.060240223
## hs_total_sweets_Ter(4.1,8.5] -0.081010725
## hs_total_sweets_Ter(8.5,Inf] 0.216537710
## hs_total_veg_Ter(6,8.5] 0.460212226
## hs_total_veg_Ter(8.5,Inf] -0.662290016
## metab_1 -0.311895089
## metab_2 0.385905761
## metab_3 -0.292794948
## metab_4 0.015837471
## metab_5 1.568182472
## metab_6 0.574305942
## metab_7 0.340036221
## metab_8 1.267953906
## metab_9 -2.271132612
## metab_10 1.780824593
## metab_11 0.270052036
## metab_12 1.312080337
## metab_13 -1.063086861
## metab_14 -5.846754121
## metab_15 1.284448169
## metab_16 3.034447155
## metab_17 -0.971046154
## metab_18 -3.131430223
## metab_19 -0.669616989
## metab_20 -1.828571982
## metab_21 -0.579589036
## metab_22 0.355017912
## metab_23 -0.696581826
## metab_24 1.182560001
## metab_25 2.669057609
## metab_26 0.023886249
## metab_27 1.432259431
## metab_28 2.458945586
## metab_29 -0.613063489
## metab_30 0.421833304
## metab_31 0.363846354
## metab_32 -0.980666345
## metab_33 0.637697410
## metab_34 -2.244945408
## metab_35 -1.188810242
## metab_36 4.096963657
## metab_37 -0.527705964
## metab_38 -0.545066235
## metab_39 -0.346670443
## metab_40 1.493529695
## metab_41 -0.031066113
## metab_42 -1.350630847
## metab_43 -0.380572837
## metab_44 1.481316441
## metab_45 0.604551202
## metab_46 0.014126441
## metab_47 1.975858237
## metab_48 -4.317396889
## metab_49 1.436981843
## metab_50 -0.190934466
## metab_51 1.155716784
## metab_52 4.708869058
## metab_53 -0.791295980
## metab_54 1.613455004
## metab_55 2.166599257
## metab_56 0.016602408
## metab_57 -4.776328063
## metab_58 4.491685733
## metab_59 -0.888365162
## metab_60 3.863561909
## metab_61 -6.789042293
## metab_62 7.303445379
## metab_63 -0.973589444
## metab_64 0.569662598
## metab_65 -4.591725795
## metab_66 0.122419072
## metab_67 -2.570017083
## metab_68 1.738365943
## metab_69 2.301012623
## metab_70 1.706530275
## metab_71 -2.785950856
## metab_72 -0.385208189
## metab_73 -1.081375044
## metab_74 -1.775137739
## metab_75 0.173707111
## metab_76 1.434844610
## metab_77 0.084220664
## metab_78 1.470554306
## metab_79 0.549792988
## metab_80 5.150695314
## metab_81 1.405265784
## metab_82 -15.557083256
## metab_83 1.907768517
## metab_84 -2.033370533
## metab_85 -3.767492824
## metab_86 2.972664416
## metab_87 7.153643275
## metab_88 1.542898311
## metab_89 -11.095434459
## metab_90 10.024855247
## metab_91 1.739097254
## metab_92 -0.331808414
## metab_93 -4.910208969
## metab_94 -0.009741673
## metab_95 9.342542793
## metab_96 1.654759735
## metab_97 -0.204267915
## metab_98 -3.986788931
## metab_99 -3.172587132
## metab_100 0.409309495
## metab_101 -2.339092968
## metab_102 -3.031017996
## metab_103 0.210266938
## metab_104 2.964893529
## metab_105 -0.595306702
## metab_106 -0.766434869
## metab_107 2.862769887
## metab_108 1.702309555
## metab_109 -0.143626106
## metab_110 -0.020270609
## metab_111 -4.721917618
## metab_112 -0.969457396
## metab_113 5.388124706
## metab_114 8.276250166
## metab_115 1.171360332
## metab_116 -3.714496932
## metab_117 1.392841523
## metab_118 -6.861578166
## metab_119 -2.335442309
## metab_120 -0.708993007
## metab_121 2.477920128
## metab_122 -5.222147710
## metab_123 7.336050211
## metab_124 7.602248915
## metab_125 -2.264330502
## metab_126 0.243967092
## metab_127 -0.458168685
## metab_128 1.667698083
## metab_129 0.263101733
## metab_130 -8.145832818
## metab_131 -6.342202995
## metab_132 -2.801368265
## metab_133 -4.939903271
## metab_134 3.990753266
## metab_135 -5.200872081
## metab_136 -1.494972615
## metab_137 2.988221954
## metab_138 6.449970477
## metab_139 1.353776081
## metab_140 0.687405421
## metab_141 -3.654939081
## metab_142 0.118262553
## metab_143 -0.032516439
## metab_144 5.062581105
## metab_145 -1.767286794
## metab_146 -1.322711932
## metab_147 -0.025789862
## metab_148 0.420414611
## metab_149 -0.414900107
## metab_150 0.688680892
## metab_151 -0.019549962
## metab_152 -0.309537563
## metab_153 -0.778355554
## metab_154 0.163924677
## metab_155 -5.404627485
## metab_156 1.525679547
## metab_157 -1.781301886
## metab_158 2.921214188
## metab_159 -0.096156986
## metab_160 -9.279626651
## metab_161 12.730750032
## metab_162 5.008959913
## metab_163 -3.686396371
## metab_164 -0.283334399
## metab_165 2.695548788
## metab_166 -3.095310226
## metab_167 -0.339090909
## metab_168 -0.804367234
## metab_169 -0.868436696
## metab_170 -0.509452215
## metab_171 -0.877797504
## metab_172 -1.147410567
## metab_173 3.205560239
## metab_174 -1.989846072
## metab_175 -2.670777766
## metab_176 0.903437525
## metab_177 1.295496261
group_lasso_predictions <- predict(group_lasso_model, newdata = x_test, type = "response")
# convert probabilities to binary predictions
binary_predictions <- ifelse(group_lasso_predictions > 0.5, 1, 0)
accuracy <- mean(binary_predictions == y_test)
cat("Group LASSO Accuracy on Test Set:", accuracy, "\n")
## Group LASSO Accuracy on Test Set: 0.7094972
conf_matrix <- confusionMatrix(factor(binary_predictions), factor(y_test))
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 125 50
## 1 54 129
##
## Accuracy : 0.7095
## 95% CI : (0.6595, 0.756)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 6.51e-16
##
## Kappa : 0.419
##
## Mcnemar's Test P-Value : 0.7686
##
## Sensitivity : 0.6983
## Specificity : 0.7207
## Pos Pred Value : 0.7143
## Neg Pred Value : 0.7049
## Prevalence : 0.5000
## Detection Rate : 0.3492
## Detection Prevalence : 0.4888
## Balanced Accuracy : 0.7095
##
## 'Positive' Class : 0
##
# ROC Curve and AUC
roc_curve <- roc(y_test, group_lasso_predictions)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Group LASSO Model (with metabolomics)")
auc_value <- auc(roc_curve)
cat("Group LASSO AUC on Test Set:", auc_value)
## Group LASSO AUC on Test Set: 0.7683593
finalized_data <- finalized_data %>% na.omit()
median_value <- median(finalized_data$hs_zbmi_who, na.rm = TRUE)
finalized_data$hs_zbmi_who_binary <- ifelse(finalized_data$hs_zbmi_who > median_value, 1, 0)
set.seed(101)
trainIndex <- createDataPartition(finalized_data$hs_zbmi_who_binary, p = .7, list = FALSE, times = 1)
train_data <- finalized_data[trainIndex,]
test_data <- finalized_data[-trainIndex,]
train_data_clean <- train_data[complete.cases(train_data), ]
x_train <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = train_data_clean)[,-1]
y_train <- as.numeric(train_data_clean$hs_zbmi_who_binary)
test_data_clean <- test_data[complete.cases(test_data), ]
x_test <- model.matrix(hs_zbmi_who_binary ~ . - hs_zbmi_who, data = test_data_clean)[,-1]
y_test <- as.numeric(test_data_clean$hs_zbmi_who_binary)
num_chemicals <- length(chemicals_selected)
num_diet <- length(diet_selected)
num_covariates <- ncol(outcome_and_cov) - 2 # excluding outcome and binary outcome
total_length <- num_chemicals + num_diet + num_covariates
group_indices <- c(
rep(1, num_chemicals), # Group 1: Chemicals
rep(2, num_diet), # Group 2: Postnatal diet
rep(3, num_covariates) # Group 3: Covariates (excluding outcome)
)
length(group_indices) == ncol(x_train)
## [1] FALSE
# adjust length if necessary
if (length(group_indices) < ncol(x_train)) {
group_indices <- c(group_indices, rep(4, ncol(x_train) - length(group_indices)))
}
length(group_indices) == ncol(x_train)
## [1] TRUE
group_lasso_model <- grplasso(x_train, y_train, index = group_indices, lambda = 0.1, model = LogReg())
## Couldn't find intercept. Setting center = FALSE.
## Lambda: 0.1 nr.var: 60
group_lasso_coef <- coef(group_lasso_model)
print(group_lasso_coef)
## 0.1
## e3_sex_Nonemale 0.215800640
## e3_yearbir_None2004 -0.313105353
## e3_yearbir_None2005 0.147831260
## e3_yearbir_None2006 0.399591772
## e3_yearbir_None2007 0.662232475
## e3_yearbir_None2008 0.820956883
## e3_yearbir_None2009 1.535090138
## h_edumc_None2 0.357957282
## h_edumc_None3 0.332942693
## h_cohort2 1.769290964
## h_cohort3 1.861416323
## h_cohort4 1.289192363
## h_cohort5 0.736127336
## h_cohort6 0.819950804
## hs_child_age_None -0.251413569
## h_bfdur_Ter(10.8,34.9] 0.022481336
## h_bfdur_Ter(34.9,Inf] 0.421746681
## hs_bakery_prod_Ter(2,6] -0.367140841
## hs_bakery_prod_Ter(6,Inf] -0.665749368
## hs_dairy_Ter(14.6,25.6] 0.176559184
## hs_dairy_Ter(25.6,Inf] -0.081425257
## hs_fastfood_Ter(0.132,0.5] 0.121228175
## hs_fastfood_Ter(0.5,Inf] 0.067584836
## hs_org_food_Ter(0.132,1] 0.103549297
## hs_org_food_Ter(1,Inf] 0.072419958
## hs_readymade_Ter(0.132,0.5] -0.025717192
## hs_readymade_Ter(0.5,Inf] 0.003605485
## hs_total_bread_Ter(7,17.5] -0.227202592
## hs_total_bread_Ter(17.5,Inf] -0.145018021
## hs_total_fish_Ter(1.5,3] -0.037890718
## hs_total_fish_Ter(3,Inf] 0.198059213
## hs_total_fruits_Ter(7,14.1] 0.188427962
## hs_total_fruits_Ter(14.1,Inf] 0.167196028
## hs_total_lipids_Ter(3,7] -0.146801779
## hs_total_lipids_Ter(7,Inf] -0.187460509
## hs_total_potatoes_Ter(3,4] -0.021779148
## hs_total_potatoes_Ter(4,Inf] -0.017630035
## hs_total_sweets_Ter(4.1,8.5] -0.200656559
## hs_total_sweets_Ter(8.5,Inf] -0.011508070
## hs_total_veg_Ter(6,8.5] 0.069869371
## hs_total_veg_Ter(8.5,Inf] -0.142810762
## hs_cd_c_Log2 -0.008229695
## hs_co_c_Log2 0.009914760
## hs_cs_c_Log2 0.393405877
## hs_cu_c_Log2 0.456205069
## hs_hg_c_Log2 0.009165825
## hs_mo_c_Log2 -0.207038128
## hs_pb_c_Log2 -0.162471124
## hs_dde_cadj_Log2 -0.150863328
## hs_pcb153_cadj_Log2 -0.742048287
## hs_pcb170_cadj_Log2 -0.103155838
## hs_dep_cadj_Log2 -0.042837953
## hs_pbde153_cadj_Log2 -0.056592257
## hs_pfhxs_c_Log2 0.091695788
## hs_pfoa_c_Log2 -0.354839120
## hs_pfos_c_Log2 0.022225709
## hs_prpa_cadj_Log2 -0.023211134
## hs_mbzp_cadj_Log2 0.175002499
## hs_mibp_cadj_Log2 -0.115639713
## hs_mnbp_cadj_Log2 -0.089607981
group_lasso_predictions <- predict(group_lasso_model, newdata = x_test, type = "response")
binary_predictions <- ifelse(group_lasso_predictions > 0.5, 1, 0)
accuracy <- mean(binary_predictions == y_test)
cat("Group LASSO Accuracy on Test Set:", accuracy, "\n")
## Group LASSO Accuracy on Test Set: 0.6512821
conf_matrix <- confusionMatrix(factor(binary_predictions), factor(y_test))
conf_matrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 129 76
## 1 60 125
##
## Accuracy : 0.6513
## 95% CI : (0.6017, 0.6986)
## No Information Rate : 0.5154
## P-Value [Acc > NIR] : 4.022e-08
##
## Kappa : 0.3037
##
## Mcnemar's Test P-Value : 0.1984
##
## Sensitivity : 0.6825
## Specificity : 0.6219
## Pos Pred Value : 0.6293
## Neg Pred Value : 0.6757
## Prevalence : 0.4846
## Detection Rate : 0.3308
## Detection Prevalence : 0.5256
## Balanced Accuracy : 0.6522
##
## 'Positive' Class : 0
##
roc_curve <- roc(y_test, group_lasso_predictions)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Group LASSO Model (without metabolomics)")
auc_value <- auc(roc_curve)
cat("Group LASSO AUC on Test Set:", auc_value, "\n")
## Group LASSO AUC on Test Set: 0.7146279